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We consider a bivariate stationary Markov chain {X„,Y„)„>o in 
a Polish state space, where only the process (Yn)n>o is presumed to 
be observable. The goal of this paper is to investigate the ergodic the- 
ory and stability properties of the measure- valued process (n„)„>o, 
where n„ is the conditional distribution of X„ given Yq, ■ ■ ■ ,Yn. We 
show that the ergodic and stability properties of (II„)„>o are inher- 
ited from the ergodicity of the unobserved process (X„)„>o provided 
that the Markov chain {X„,Yn)n>o is nondegenerate, that is, its tran- 
sition kernel is equivalent to the product of independent transition 
kernels. Our main results generalize, subsume and in some cases cor- 
rect previous results on the ergodic theory of nonlinear filters. 

1. Introduction. In this paper we will consider a bivariate Markov chain 
{Xn,Yn)n>o taking values in a Polish state space. Only the process {Yn)n>o 
is presumed to be directly observable to us, and we aim to estimate the 
state Xn of the unobserved process given the observed data Yo,...,Yn to 
date. This is the quintessential setup in problems with partial information, 
and models of this type can therefore be found in a wide range of applica- 
tions [6]. 

We will be concerned, in particular, with the ergodic theory and stability 
properties of the measure- valued process (n„)„>o defined by the conditional 
distributions n„ = P(X„ € • |loi • • • i Yn), which is called the nonlinear filter. 
It is not difficult to show that, in general, the processes {Iln,Yn)n>o as well 
as {UrijXn, Yn)n>Q are themselves Markovian, and a typical question that we 
will aim to answer is whether ergodicity of the underlying model {Xn,Yn)n>o 
implies ergodicity of the extended Markov chain {Un, Xn,Yn)n>o in a suit- 
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(b) 




(d) 

Fig. 1. Dependence structure of (a) a classical hidden Markov model [6J; (b) a general- 
ized hidden Markov model [10, 11]; (c) a hidden Markov model with correlated noise [3]; 
(d) a general Markov model. 

able sense. Questions of this type date back at least to the work of Black- 
well [2] and Kunita [14]. Beside the intrinsic probabilistic interest in the 
development of a conditional ergodic theory of Markov chains, ergodicity 
of the filter has substantial practical relevance to understanding the perfor- 
mance of nonlinear filtering and its numerical approximations over a long 
time horizon; cf. [5, 14, 21], and see [8, 20] for further references. 

Much of the literature on the topic of this paper is concerned with the 
setting of a classical hidden Markov model whose dependence structure is 
illustrated in Figure 1(a); here the unobserved process (X„)„>o is assumed 
to be itself Markovian and the observations {Yn)n>Q are conditionally in- 
dependent.^ In this special case (n„)„>o is also Markovian, and two basic 
questions have been considered. 



^The continuous time version of this model, known as a Markov additive process, is also 
widely studied in the literature in various special cases (such as white noise or counting 
observations; see [25] for a unified view). We have restricted ourselves in this paper to 
discrete time models for simplicity. All our results are easily extended to the continuous 
time setting as in [20], Section 6. 
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(1) Does (n„)„>o possess a unique invariant measure, assuming {Xn)n>o 
does? 

For the second question, let P and P be the laws of the Markov chain 
iXn,Yn)n>o with initial laws P{Xo € •) < P(-'^o S and let Un = P(X„ G 
■\Yo,...,Yn). 

(2) Is (n„)„>o asymptotically stable in the sense that |n„(/) — 
nn(/)| — > in P-probability for every bounded continuous function /? 

These and related questions were studied in great generality by Kunita [14, 
15], Stettner [19], and Ocone and Pardoux [17] (see [1, 4, 8, 20] for further 
references). Kunita and Stettner state that the answer to the first ques- 
tion is affirmative provided that the stationary process {Xn)nez is purely 
nondeterministic, that is. 

Pi 3"f„ is P-trivial, 

n>0 

where P is the stationary law of the two-sided process (X„,,y„)„,gz and 
S'jf = a{Xk : —oo < k < n}. Ocone and Pardoux state that the answer to the 
second question is affirmative under the same assumption. Unfortunately, 
the proofs of these results contain a serious error, as was pointed out by 
Baxendale, Chigansky and Liptser [1] . Indeed, the crucial step in the proofs 
is the identity 

fl J^Vjf„ = 5-^, P-a.s., 

where = (7{Yjt : —oo < k < 0}. It is tempting to exchange the order of the 
intersection n and supremum V of cj-fields, which would allow us to conclude 
this identity from the assumption that (X„)„gz is purely nondeterministic. 
But such an exchange cannot be taken for granted (see [7], page 30) and 
requires proof. In the filtering setting, various counterexamples given in [1, 
22] show that the answers to the above questions may indeed be negative 
even when (Xn)nez is purely nondeterministic, in contradiction with the 
conclusions of [14, 15, 17, 19]. 

Before we proceed, let us briefly recall a simple counterexample from 
[1, 22] that will be helpful in understanding the problems addressed in this 
paper. 

Example 1.1. Let (^n)nGZ be an i.i.d. sequence of random variables 
taking the values {0, 1} with equal probability under P, and define 

Xn = {S,n,(n+l), Yn = \^n+l - £,n\ ■ 

Then {Xn)n£Z is an ergodic Markov chain in {00,01,10,11} that is purely 
nondeterministic by the Kolmogorov zero-one law, and {Xn, in)n>o is a hid- 
den Markov model as in Figure 1(a). But clearly = (Ci + + ' ' ' + 
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l^_i)mod2, so that 

UM) = f{Xn), n„(/) = :^™±^^^H^, P-a.s. 

where we defined P(-) = P(- \Xq = 00). Thus the filter is not asymptotically 
stable, and one may similarly establish that it admits distinct invariant 
measures. 

One feature of the model of Example 1.1 is that it possesses degenerate 
observations in the sense that 1^ is a function of Xn without any additional 
noise. The phenomenon illustrated here turns out to disappear when some 
independent noise is added to the observations, for example, Yn = \^n+i — 
S,n\ + Vn where {r]n)nez is an i.i.d. sequence such that the law of tjq has 
a nowhere vanishing density. In [20], one of the authors developed this idea 
to establish ergodicity and stability properties of the nonlinear filter under 
very general assumptions. To this end, let {Xn,Yn)ni=z be a stationary hidden 
Markov model under P, and assume that: 

(1) {Xn)nez is absolutely regular: E(||P(X„ G • \Xo) - P(X„ € OIItv) ^ 0. 

(2) The observations are nondegenerate: P{Yn G ^|^n.) = X4 9{Xn-,y)^{dy) 
for some strictly positive density g{x,y) > and reference measure (p. 

Then the above exchange of intersection and supremum of cr-fields is per- 
mitted, and the filter is stable [20] and uniquely ergodic [22]. Intuitively, 
nondegeneracy (which formalizes the notion of "noisy" observations) rules 
out the singular observation structure that causes the exchange of intersec- 
tion and supremum to fail in Example 1.1. However, this intuition should not 
be taken too literally, as a more difficult example in [22] shows that the result 
may still fail if absolute regularity is replaced with the weaker purely nonde- 
terministic assumption. Therefore, the assumptions in [14, 15, 17, 19] (which 
implicitly assume nondegeneracy, though this is not used in the proofs) are 
genuinely too weak to yield the desired results. 

The results discussed above all assume the classical hidden Markov model 
setting illustrated in Figure 1(a). Such models are quite fiexible and appear 
in a wide array of applications [6]. Nonetheless, there are many applica- 
tions in which the need arises for more general classes of partially observed 
Markov models. For example, two common generalizations of the classical 
hidden Markov model are illustrated in Figure 1(b) and (c). The model of 
Figure 1(b) is a generalized hidden Markov model [10] or an autoregressive 
process with Markov regime [11]. This model is similar to a hidden Markov 
model in that the dynamics of {Xn)n>o do not depend on the observations 
(Yn)n>o', however, here the observations are not conditionally independent 
but may possess their own dynamics. Such models are common in financial 
mathematics, where (Yn)n>o might represent a sequence of investment re- 
turns while {Xn)n>o models the state of the underlying economy. On the 
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other hand, in the model of Figure 1(c) there is feedback from the observa- 
tions to the dynamics of the unobserved process {Xn)n>o- Such models arise 
when the noise driving the unobserved process and the observation noise are 
correlated. 

In these more general models, the process (n„)„>o is no longer Markovian, 
but the pair {Iln,Yn)n>o is still Markov. It is therefore natural, and of sig- 
nificant interest for applications, to investigate the ergodicity of (Iln,Yn)n>o 
and the asymptotic stability of (n„)„>o in a more general setting. It has 
been shown by Di Masi and Stettner [10] for the model of Figure 1(b), and 
by Budhiraja [3] for the model of Figure 1(c), that these problems can be 
reduced to establishing the validity of the exchange of intersection and supre- 
mum of cr-fields along the lines of the earlier approach for classical hidden 
Markov models in [14, 15, 17, 19]. The generalization of the positive results 
in [20] is far from straightforward, however. 

To illustrate one of the complications that arises in generalized models, 
let us consider the setting of Budhiraja [3]. Budhiraja considers a model of 
the form 

Xn = f{Xn-l,Yn-l,S,n), Yn = h{Xn) + Vn, 

where {Cn)n>i and (?/n)n>o are independent i.i.d. sequences. It is assumed 
that /, h are continuous functions and that r]Q possesses a bounded and con- 
tinuous density with respect to some reference measure ip. This is evidently 
a hidden Markov model with correlated noise of the type illustrated in Fig- 
ure 1(c). The main result in [3] states that if this model admits a unique 
stationary law P and if {Xn)neZ is purely nondeterministic, then (n„, Yn)n>o 
possesses a unique invariant measure. Budhiraja's proof contains the same 
gap as in [14, 15, 19]; indeed, the result is clearly erroneous in light of 
Example 1.1. Nonetheless, it seems reasonable to guess that if we assume 
nondegeneracy of the observations (i.e., that the density of rjQ is strictly 
positive) and absolute regularity of the unobserved process, then the result 
will hold as in [20]. Even this, however, turns out to be false. 

Example 1.2. Define the {00,01, 10, ll}-valued process {Xn)nez and 
real-valued process (l^)nez such that Xq is uniformly distributed in {00,01, 
10,11}, 

i^L^n) = i^n-lJ - -^[O,oo[(^n.-l)|), Yn = r]n, 

where (??n)nez are i.i.d. A^(0, l)-distributed random variables. Then the pro- 
cess {Xn, I[o,oo[0^n-i))n€Z has the same law as the classical hidden Markov 
model of Example 1.1, so stability and unique ergodicity of the filter must 
fail. 

Even though the observations are ostensibly nondegenerate in this exam- 
ple, the feedback from the observations affects the dynamics of the unob- 
served process in a singular fashion that recreates the problems of Exam- 
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pie 1.1. We thus need at least a different notion of nondegeneracy in order 
to rule out such phenomena. 

The goal of this paper is to develop a general ergodic and stability theory 
for nonlinear filters that subsumes all of the models discussed above. Indeed, 
we do not impose any structural assumptions other than that (X„,l^)„>o 
is a Markov chain that possesses a stationary law P [as is illustrated in 
Figure 1(d)]. The main assumptions of this paper generalize those of [20]; 
we assume that the model is: 

(1) absolutely regular: E(||P((X„, y„) G • \Xo,Yo)-P{iXn,Yn) G •) IItv) ^0; 

(2) nondegenerate: there exist kernels Po,Q and a density g{x' , y' , x,y) > 
so that P((X„+i, G A\Xn,Yn) = X4 g{Xn,Yn, X, y)Po{Xn, dx)Q{Yn, dy). 

The latter assumption states that the dynamics of the observed and unob- 
served processes can be made independent (on finite time intervals) by an 
equivalent change of measure. It is easily seen that the notion of nondegen- 
erate observations for the classical hidden Markov model is a special case 
of this assumption; on the other hand, the present assumption also rules 
out the phenomenon observed in Example 1.2. This general nondegeneracy 
property appears to be precisely the right assumption required to generalize 
the results of [20], and seems very natural in view of Examples 1.1 and 1.2. 
The absolute regularity assumption on (X„,l^)„gz can in fact be weakened 
somewhat; see Sections 2.4 and 2.5 for a precise statement. 

With the above assumptions in place, we will show that Kunita's exchange 
of intersection and supremum of cj-fields is permitted in our setting, and we 
can consequently develop general asymptotic stability and unique ergodicity 
results. The intuition behind the proofs is similar in spirit to the classical 
hidden Markov model setting in [20, 22], and we refer to those papers for 
a discussion of the basic ideas. Nonetheless, to our surprise, key parts of the 
proofs in [20] break down completely in the generalized setting of this paper 
and almost all arguments in [20] require substantial modification, as we can 
no longer exploit many simplifying properties that hold trivially in classical 
hidden Markov models. The proofs in the present paper rely on the ergodic 
properties of nondegenerate Markov chains that are developed in Section 3 
below. Though this paper is almost entirely self-contained, the reader may 
find it helpful to familiarize herself first with the simpler setting of [20]. 

This paper is organized as follows. Section 2 introduces the general model 
used throughout the paper and states our main results. We also give useful 
sufficient conditions for the models in Figure l(a)-(d). Section 3 develops 
the ergodic properties of nondegenerate Markov chains that play a central 
role in our proofs. Sections 4-7 are devoted to the proofs of our main results. 
Appendices A and B collect auxiliary results and a notation list that is used 
throughout the paper. 
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2. Preliminaries and main results. 

2.1. The canonical setup. Throughout this paper we consider the bi- 
variate stochastic process {Xn,Yn)nez, where Xn takes values in the Polish 
space E and Yn takes values in the Polish space F. We realize this process 
on the canonical path space Q = Q'^ x with Q-^ = and OX = F^, such 
that Xn{x, y) = x{n) and Yn{x, y) = y{n). Denote by 3" the Borel a-field of fi, 
and define 

'Xf = cj{Xk:keI}, Jf = (j{yfe:A;e/}, J/ = jf V 
for / C For simplicity of notation, we define the natural filtrations 

"^n = 3"]^ oo,n] ' = ^f-oo,n] ' -^n = 3"] -00,71] ^ ^) 

and the u-fields 

rpX rrX rrY rrY rt'yC rrX rrY rrY 

—Jzj — ^Z' -^+— ^[o,oo[' -^[Cool- 

Finally, we denote by Y the F^-valued random variable (yfc)fcgZ; and the 
canonical shift : — )• is defined as 0(3;, y){m) = {x{m + l),y{m + 1)). 

For any Polish space Z, we denote by 23 (Z) its Borel <t- field and by J'(.Z') 
the space of all probability measures on Z endowed with the weak conver- 
gence topology [thus 'J'{Z) is again Polish]. Let us recall that any probability 
kernel p:Zx 'B{Z') — )■ [0,1] may be equivalently viewed as a J'(Z')-valued 
random variable z>-^p{z,-) on {Z,'B{Z)). For notational convenience, we 
will implicitly identify probability kernels and random probability measures 
in the sequel. 

2.2. The model. The basic model of this paper is defined by a Markov 
transition kernel P : E x F x 'B{E x F) ^ [0,1] and a P-invariant probability 
measure vr on (E x F,'B{E x F)), which we presume to be fixed through- 
out the paper. We now define the probability measure P on (^2,9") such 
that, under P, the process {Xn,Yn)n£Z is the stationary Markov chain with 
transition kernel P and stationary distribution vr. We interpret Yn to be the 
observable component of the model, while Xn is the unobservable component. 

As {Xn,Yn)nGZ is a stationary Markov chain under P, the reverse time 
process (X_„,y_„)„g2 is again a stationary Markov chain. We fix through- 
out the paper a version P' : E x F x 'B{E x F) ^ [0,1] of the regular condi- 
tional probability P((X_i,y_i) G • |Xo,yo)- Thus, by construction, the pro- 
cess {X^n,Y-n)n& is a stationary Markov chain with transition kernel P' 
and invariant measure vr. 

In addition to the probability measure P, we introduce the probability 
kernel P' : E x F x 3' ^ [0,1] with the following properties: under p^''^^ 

(1) {Xn,Yn)n>o is Markov with transition kernel P and initial measure 

6z (8) Sw] 
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(2) yi„)„>o is Markov with transition kernel P' and initial measure 



(3) (X„,y„)„>o and (X_„, Y'_„)„>o are independent. 

Clearly p^'"' is a version of the regular conditional probability P(- |Xo,lo)- 
Finally, for any probability measure z/ on (E x F,'B{E x F)), we define 



Note, in particular, that P'^ coincides with P by construction. 

2.3. The nonlinear filter. As Xn is not directly observable, we are inter- 
ested in the conditional distribution of X„ given the history of observations 
to date Yq, . . . ,Yn. To this end, we define for every probability measure fi on 
E X F and n > the nonlinear filter Iln : x 'B{E) [0, 1] to be a version 
of the regular conditional probability P^{Xn £ • IS'^^])- The nonlinear filter 
is the central object of interest throughout this paper. 

We now state some basic properties of the nonlinear filter. The first prop- 
erty establishes that the filter can be computed recursively. 

Lemma 2.1. There is a measurable map U : '?{E) x F x F —> ?'(£') such 
that U',l = U{n'^^^,Yn^i,Yn) P^-a.s. for every n>l and fie^iE x F). 

Remark 2.2. In the proof of our main results, it will be convenient to 
assume that the identity Hn = U{Il'^_i,Yn-i,Yn) holds everywhere on OX 
and not just P'^-a.s. This corresponds to the choice of a particular version 
of the nonlinear filter. However, as none of our results will depend on the 
choice of version of the filter, there is clearly no loss of generality in fixing 
such a convenient version for the purposes of our proofs, as we will do in 
Section 5. 

We now consider (n^^)„>o as a 7{E)-valued stochastic process. The sec- 
ond property establishes that this measure-valued process inherits certain 
Markovian properties from the underlying model {Xn,Yn)n>o- 

Lemma 2.3. There exist Markov transition kernels T on J'(-E) x F and A 
on y{E) X E X F such that the following hold: for every fi G y{E x F), 

(1) {Jln,Yn)n>Q is a Markov chain under P^ with transition kernel T; 
and 

(2) {Hn, Xn,Yn)n>o ^ Markov chain under with transition kernel A. 
For any m € 'J'{'J'{E) x F), define the barycenter bm € 'J'{E x F) as 




for all AeJ. 




We finally state some properties of F- and A-invariant measures. 
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Lemma 2.4. For any T -invariant probability measure m E 'J'{'J'{E) x F), 
the barycenter bm is a P-invariant probability measure. Conversely, there 
exists at least one T -invariant probability measure with barycenter vr. 

Similarly, for any A-invariant probability measure M G x E x F), 

the marginal M(T{E) x •) is a P-invariant probability measure. Conversely, 
there exists at least one A-invariant probability measure with marginal vr. 

In general, there may be multiple F-invariant measures with barycenter vr, 
etc. Our main results will establish uniqueness under suitable assumptions. 

Remark 2.5. For the purposes of this paper it suffices to establish the 
above results for the case where Assumption 2.8 below is assumed to hold. 
In this setting, these results will be proved in Sections 6.1 and 7.1. In fact, 
the results in this subsection hold very generally as stated without any 
further assumptions, but the proofs in the general setting are somewhat more 
abstract. Such generality will not be needed in this paper, and we therefore 
leave the generalization of the proofs (along the lines of [22], Appendix A.l) 
to the interested reader. 

2.4. Main results. We begin by introducing the fundamental model as- 
sumptions that are required by our main results. Let us emphasize that we 
will at no point in the paper automatically assume that any of these assump- 
tions is in force; all assumptions will be imposed explicitly where they are 
needed. Some useful sufficient conditions will be given in Section 2.5 below. 

Assumption 2.6 (Marginal ergodicity). The following holds: 

I ||P^'"'(X„ G ■) - P(^n G •)IItv^(^^^, dw) 0. 

Assumption 2.7 (Reversed marginal ergodicity). The following holds: 
I ||P^'"(X_„ G •) - P(^-n G ■)\\TY7r{dz,dw) 0. 

Assumption 2.8 (Nondegeneracy). There exist transition probability 
kernels Pq : E x 'B{E) [0, 1] and Q : F x S(F) [0, 1] such that 

P{z, w, dz' ,dw') = g{z, w, z' ,w')Po{z, dz')Q{w, dw') 

for some strictly positive measurable function g:ExFxExF^ ]0, oo[. 

We now proceed to state the main results of this paper. Our results ad- 
dress in turn each of the problems discussed in the Introduction: the ex- 
change of intersection and supremum of cr-fields, asymptotic stability of 
the nonlinear filter and unique ergodicity of the processes {Iln,Yn)n>o and 
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Our first result establishes the validity of Kunita's exchange of intersection 
and supremum, and its time-reversed cousin, in the generalized setting of 
this paper. 

Theorem 2.9. Suppose that Assumptions 2.6-2.8 are in force. Then 

n>0 n>0 

Our second result concerns filter stability which can be established in our 
setting (as in [20]) in a very strong sense: pathwise and in the total variation 
topology. 

Theorem 2.10. Suppose that Assumptions 2.6-2.8 are in force. Let fi 
be a probability measure on E x F such that fi{E x ■) <^ Tr{E x •) and 

E^(||P^(X„ € • \Yo) - P{Xn G OIItv) 0- 
Then ||n^J - H^Htv P^-a.s. fand P-a.s. if fi{E x •) ~7r(£; x •)/. 

Remark 2.11. The assumptions of Theorem 2.10 may be more intu- 
itive when phrased in terms of the filtering recursion in Lemma 2.1. Let 
p:Fx 'B(E) — )• [0,1] be a probability kernel, and define the random mea- 
sures (n„)„>o by the recursion 

no = p(yo,-), n„ = ;7(n„_i,y„_i,y„). 

Suppose that the dynamics of {Xn)n>o are such that the random initial 
law p is in the domain of attraction of the stationary distribution vr in the 
sense that 

||pp(^,.)®<5»(j^^ G •) - P(^n G OIItv in T^{E x (iu')-probability. 

Then ||n„, — ITJ^Htv "~^°"> P-a.s. Indeed, this follows immediately from 
Theorem 2.10 by setting p{dz,dw) = p{w,dz)TT{E x dw). Therefore, we may 
interpret Theorem 2.10 as follows: the filtering recursion of Lemma 2.1 is 
asymptotically stable inside the domain of attraction of the stationary dis- 
tribution. 

The result of Theorem 2 . 1 is easily extended to show 1 1 11^ — IIJ:^ 1 1 t v > 

P'''-a.s. whenever all three initial measures 7 are in the domain of at- 
traction of the stationary distribution in the above sense, using Corollary 3.6 
below. 

Our third result concerns uniqueness of the F-invariant measure. 

Theorem 2.12. Suppose that Assumptions 2.6-2.8 are in force. Then 
there exists a unique T -invariant probability measure with bary center tt. In 
particular, if P has a unique invariant probability measure, then so does T. 
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Our fourth result concerns uniqueness of the A-invariant measure. The 
situation here is a httle more comphcated; Assumptions 2.6-2.8 only ensure 
uniqueness within a restricted class of measures (cf. [15]), while a some- 
what stronger variant of Assumption 2.6 yields uniqueness in the class of all 
probability measures. 

Theorem 2.13. Suppose that Assumptions 2.6-2.8 hold. Then there ex- 
ists a unique A-invariant probability measure with marginal n on E x F in 
the class 



M €?(T(E) X ^ X F):for every A£'B{?{E)),B e'B{E),C e'B{F), 



for every probability measure n on E x F such that fi{E x •) = Tr{E x ■), 
then there exists a unique A-invariant probability measure with marginal tt 
among all probability measures in CP(T(i?) x E x F). If we assume even 
further that P has a unique invariant probability measure, then so does A. 

The following sections are devoted to the proofs of these results: Theo- 
rems 2.9, 2.10, 2.12 and 2.13 are proved in Sections 4, 5, 6 and 7, respectively. 

2.5. Sufficient conditions. Our main results rely on the fundamental As- 
sumptions 2.6-2.8. In most applications, the form of the transition kernel P 
is explicitly (or semi-explicitly) given. Existence and uniqueness of an in- 
variant measure vr and the ergodicity Assumption 2.6 can often be verified 
in terms of P only (cf. [16]), while the nondegeneracy Assumption 2.8 can 
be read off directly from the explicit form of P. On the other hand, explicit 
expressions for the invariant measure tt or the reversed transition kernel P' 
are often not available, so that Assumption 2.7 may be difficult to verify 
directly. The goal of this section is to provide sufficient conditions for our 
main results that are easily verified in practice. 

2.5.1. General sufficient conditions. Our main sufficient condition is ab- 
solute regularity (cf. [23]), of the process (A'„,y„)„gz, which was the as- 
sumption stated in the Introduction. This is slightly stronger than Assump- 
tions 2.6 and 2.7, but has the benefit that it is automatically time-reversible 
and therefore easily verifiable. 




If, in addition, we have 




p-.-(X„ G •) -P(^n G ■)hv^l{dz,dw) 
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Lemma 2.14. Suppose that {Xn,Yn)nez is absolutely regular, 

I \\P^'-'{iXn,Yn)e-)-7r\\^y7r{dz,dw)^^0. 

Then both Assumptions 2.6 and 2.7 hold true. 

Proof. Absolute regularity trivially yields Assumption 2.6. On the other 
hand, the absolute regularity property of a stationary Markov chain is in- 
variant under time reversal by [20], Proposition 4.4, so that Assumption 2.7 
follows. □ 

Similarly, the convergence assumption in Theorem 2.10 also admits a slight- 
ly stronger but potentially more easily verified counterpart. 

Lemma 2.15. Suppose that Assumption 2.6 holds. Let fi be a probability 
measure on E x F such that ||P'^((X„, Yn) € •) — vr||TV —^0 as oo. Then 

E'^(||p'^(x„ G • |yo) - P(^n G OIItv) 0- 

Proof. Define the quantity 

Ak{x,y) = WP'^'^Xk G •) - P(^fc e OIItv- 

By the stationarity of P, the Markov property and ||Afc — l||oo < 1, we can 
estimate 

E^(||P'^(X„+fe G • \Yo) - P{Xn+k G OIItv) 

< E'^(||P^"'^"(Xfe G •) - F{Xk G OIItv) 

= E(Afc(X„,y„)) + {E'^(Afe(X„,y„) - 1) -E(A,,(X„,y„) - 1)} 

< E(||P^O'^o(Xfe G •) - P{Xk G OIItv) + \\P''{{Xn,Yn) G •) - vtHtv 
This expression converges to zero as k,n—^oo by our assumptions. □ 

2.5.2. Generalized hidden Markov models. We now consider the special 
case where the underlying model {Xn,Yn)nez is a generalized hidden Markov 
model, whose dependence structure is illustrated in Figure 1(b). Under As- 
sumption 2.8, this dependence structure is enforced by the additional re- 
quirement that 

J g{z,w, z' ,w')Q{w,dw') = 1 for all w G F, z, z' G -E. 

This implies that {Xn)nez is itself Markovian under P with transition ker- 
nel Pq, and the probability measure ttq = 7r(- x F) must then be Pg-invariant. 
In this setting, it suffices to consider the ergodic properties of the unobserved 
process, provided that the reference transition kernel Q{w,dw') does not de- 
pend on w. 
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Lemma 2.16. Suppose that Assumption 2.8 holds with Q{w,dw') = ip{dw') 
for some probability TficdsuTc ip OTi F, and that (-X^j2?^^)nGZ 

is a generalized 

hidden Markov model in the above sense. If {Xn)nez is absolutely regular 

j \\P^{z,-) - Mhv^oidz) ^^^^f), 

then both Assumptions 2.6 and 2.7 hold true. 

Proof. We reduce to the case of Lemma 2.14. A stationary Markov 
chain is absolutely regular if and only if for almost every pair of initial con- 
ditions, there is a finite time n at which the laws of the chain are not mutually 
singular (e.g., this is a special case of Theorem 4.1 below). Therefore, our 
assumption implies that for ttq (8) vTQ-a.e. {z,z'), there is an n > such that 
Pq{z,-) and Pq{z',-) are not mutually singular. But as Q{w,dw') = ip{dw') 
and by Assumption 2.8, we have P^{z,w, •) ~ Po {z, ■)'S"f and P'^{z' ,w' , •) ~ 
Pq{z', ■) (8) for every z, w, z' ,w' . It follows that for 7r(8)vr-a.e. {{z, w), {z' ,w')) 
there is an n > such that P'^{z, w, ■) and P'^(z' , w', ■) are not mutually sin- 
gular. We have therefore shown that the absolutely regularity assumption 
of Lemma 2.14 holds. □ 

Remark 2.17. By the generalized hidden Markov structure P^'"'(X„ G 
■) = Pq{z, •) is independent of w, so that Assumption 2.6 follows immediately 
from the absolute regularity of {Xn)nez- Unfortunately, the generalized hid- 
den Markov property is not invariant under time reversal, so this argument 
does not guarantee that Assumption 2.7 holds. The additional assumption 
that Q{uj,duj') = ip{dw') allows us to circumvent this problem by reducing 
to the case of Lemma 2.14. 

We also have a counterpart of Lemma 2.15. 

Lemma 2.18. Suppose the assumptions of Lemma 2.16 hold. Let fi be 
a probability measure on E x F so that x F)Pq — vtoHtv ~^ as n — t- cxd. 
Then 

E^(||p'^(x„ € • |yo) - P{Xn € OIItv) 0- 

Proof. We reduce to the case of Lemma 2.15. As Q{w,dw') = ip{dw'), 
we obtain vr ~ ttq (8) 9? and /uP" ~ x F)P^ ip for all n > by As- 
sumption 2.8. Choose Sn G ^(E) such that x F)P^{- n Sn) < vtq and 
7ro(S',^) = for all n [so Sn defines the Lebesgue decomposition of fi{- x F)Pq 
with respect to ttq]. Then clearly /iP"'(- n S'n x F) <C tt and 7r(5^ x F) = 0. 
Therefore 

||/iP'^+" - ^IItv < l^P^'iSn X F)\\iynP'' - vtIItv + 2/iP"(5^ x F) 
< \\unP'' - ttIItv + M- X F)Po" - ^oIItv, 
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where we have defined i/„ = /iP"(-nS'n x F)/fiP'^{Sn x F). But as {Xn,Yn)nez 
is absolutely regular (cf. Lemma 2.16) and i^n ^ vr, the first term converges 
to zero as k ^ oo. Letting n ^ oo and applying Lemma 2.15 yields the 
result. □ 

2.5.3. Hidden Markov models with correlated noise. We now turn to the 
special case where the underlying model {Xn,Yn)nez is a hidden Markov 
model with correlated noise, whose dependence structure is illustrated in 
Figure 1(c). Under Assumption 2.8, this dependence structure is enforced 
by the following requirement: there is a probability measure (p on F such that 
Q{w, dw') = ip{dw'), and there are measurable functions gx '■ E x F x E M_|_ 
and gy :E X F M.^ such that 

g{z,w,z',w') = gx{z,w,z')gY{z',w'), j gY{z,w)(f{dw) = 1. 

Unlike in the case of a generalized hidden Markov model, in the present 
model the probabilities P^'"'(X„, G •) do depend on w. Nonetheless, in the 
present case the unobserved process {Xn)n£Z is still Markov under the sta- 
tionary measure P with respect to its own filtration, with transition ker- 
nel pQ given for Ae'B{E) by 

Po{z,A) = j P{z,w,Ax F)gYiz,w)^{dw). 

To see this, note that 7r{dz,dw) = gY{z,w)Ti{dz x F)ip{dw) by our assump- 
tion on P and vrP = vr, so we can compute P(X„+i e A\3^-^) = Po(X„, A). 

Remark 2.19. Unlike in the case of a generalized hidden Markov model, 
where Q{w, dw') = ip{dw') is an additional assumption, in the present setting 
the assumption Q{w,dw') = ip{dw') entails no loss of generality. Indeed, the 
hidden Markov structure with correlated noise can be generally formulated 
by the requirement that P{z,w,dz' ,dw') = Px{z,w,dz')PY{z' ,dw') for some 
probability kernels Fx and Py. It is easily seen that any such model that 
also satisfies Assumption 2.8 must have the above form for a suitable choice 

of if. 

The idea is now that in the present setting, it suffices to consider the 
ergodic properties of the unobserved process (i.e., the transition kernel Pq). 

Lemma 2.20. Suppose that Assumption 2.8 holds and that {Xn,Yn)n£i 
is a hidden Markov model with correlated noise in the above sense. If also 

j ||Po"(2;,-) -vro||Tvvro(d2;) ^^^0, 

where ttq = tt{- x F), then both Assumptions 2.6 and 2.7 hold true. 
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Proof. Note that for all {z,w) £ E x F, B £ 'B{E x F) and n > 1 we 
have 

Therefore, we have for n > 1 

j \\P%z,w,-)-ATWT^{dz,dw)< j ||Po"-1(z,-)-^o||tv. 

The result now follows directly from Lemma 2.14. □ 

In the present setting (as in the case of a classical hidden Markov model) , 
the most natural initial measures /x are those that are compatible with the 
observation model in the sense that ^{dz,dw) = gY{z,w)fio{dz)ip{dw) for 
some probability measure /io on E. This yields the following counterpart 
of Lemma 2.18, whose proof (by reduction to Lemma 2.15) is trivial and is 
therefore omitted. 

Lemma 2.21. Suppose the assumptions of Lemma 2.20 hold. Let ^lq he 
a probability measure on E such that H^o^o^ ~ ^oIItv ^ as n —t- oo. Then 

E'^(||P'^(X„ G . I^o) - P{Xn G OIItv) 0, 
where we have defined fj,{dz,dw) = gY{z,w)fj,o{dz)Lp{dw). 

Remark 2.22. Let us note that in all of the special cases discussed 
above the process {Xn,Yn)n£Z is absolutely regular so that Assumptions 2.6 
and 2.7 hold by virtue of Lemma 2.14. Absolute regularity of (Xn,Yn)n£Z is 
not necessary, however, for Assumptions 2.6 and 2.7 to hold. For example, 
in the trivial case that Assumption 2.8 holds with = 1, it is easily seen that 
Assumptions 2.6 and 2.7 hold if and only if the unobserved process {Xn)nez 
is absolutely regular, while the pair process {Xn,Yn)nez need not even be 
ergodic [e.g., when Q{w,dw') = 6w{dw')]. Thus Assumptions 2.6 and 2.7 are 
strictly weaker than the absolute regularity of the pair process {Xn,Yn)nez- 
Nonetheless, the latter assumption is very mild and will likely hold in most 
applications of practical interest. 

3. Nondegenerate Markov chains. The nondegeneracy Assumption 2.8 
will play an essential role in our theory. Before we can turn to the proofs of 
our main results, we must therefore begin by establishing some general con- 
sequences of the nondegeneracy assumption that will be needed throughout 
the paper. 

3.1. Product structure of the invariant measure. Assumption 2.8 states 
that the transition kernel P of the Markov chain {Xn,Yn)n£Z is equivalent 
to a product of transition kernels of two independent Markov chains. Our 
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first question is, therefore, wlietlier this forces the invariant measure vr to 
possess a similar product structure; that is, if a stationary Markov chain is 
nondegenerate, then is its invariant measure necessarily equivalent to the 
product of its marginals? In general, of course, the answer is negative (e.g., 
consider the case where P is the identity and vr is any probability measure 
that is not equivalent to a product measure). However, we will presently 
show that if, in addition to nondegeneracy, we assume that the marginal 
process (X„)„g2 is ergodic in a suitable sense, then vr is forced to possess 
the desired product structure. 

We need two lemmas. The first states that the nondegeneracy of the 
transition kernel P implies that the iterates are also nondegenerate; in 
fact, we show that P((X„, y„) G • |Xo, Iq) ~ P(^n. G • |-^o) ® P(i^n. G • |>o)- 

Lemma 3.1. Suppose that Assumption 2.8 is in force. Choose fixed ver- 
sions Tr^{w,dz), 7r-^{z,dw) of the regular conditional probabilities P{Xo € 
• \ Yq), P{Yq G • \Xq), respectively, and define the probability kernels 

P^{z,A) = j lAiz')P''{z,w,dz',dw')TT^{z,dw), 

P^{w,B) = j lB{iu')P''{z,w,dz',dw')7r^{w,dz). 

Then we have for a// n € N 

w, dz', dw') = Gn{z, w, z', w')P^{z, dz')P^{w, dw'), 
where Gn :ExFxExF^ ]0, oo[ are strictly positive measurable functions. 

Proof. From the Assumption 2.8, it follows directly that 

w, dz', dw') = gn{z, w, z',w')P^{z, dz')Q''{w, dw') 

for some strictly positive measurable function gn'-E x F x E x F ^ ]Q,oo[. 
But then the result follows directly from the definition of P^ , P^ with 

Gn{z,W,z' ,w') 

= gn{z,w,z' ,w') 

X gn{z,w,z' ,w')Q'^{w,dw')Tr^ {z,dw) 



X j gn{z,w,z ,w)PQ{z,dz)'K {w,dz) 

The proof is complete. □ 

The second lemma states that if the unobserved process {Xn)nez is er- 
godic in a suitable sense, and if the nondegeneracy assumption holds, then 
every P-invariant function is independent of its unobserved component. 
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Lemma 3.2. Suppose that Assumption 2.8 is in force, and that 

I \\P^{z,-)-7ri.xF)\\TY7T{dzxF)^^0. 

Then for any bounded measurable function f '.E x F —t'M. that is P -invariant 
(i.e., f = Pf), there exists a bounded measurable function g:F^M. such 
that f{z, w) = g{w) for ir-a.e. {z, w) G E x F. 

Proof. As / is P-invariant, the process {f{Xn,Yn))n>o is a martingale 
under P. By stationarity and the martingale convergence theorem, 

B{\f{Xn,Yn) - f{Xo,Yo)\) = E{\f{Xr,+k,Yn+k) - f{Xk,Yk)\) ^^0. 

In particular, we have 

J P''^{f{Xo,Yo) = f{Xn,Yn) ioTalln>0)7r{dz,dw) = l. 

Therefore, we may choose a set Hi G'B{E x F) with tt{Hi) = 1 such that 

P^'"'(/(z,i«) = /(X„,y„) for ann>0) = l for all {z,w)eHi. 

Next, let p:Fx "B^E) ^ [0, 1] be a version of the regular conditional prob- 
ability P(Xo G • 1^)- Then by our assumption and the triangle inequality, 

J WP^iz, •) - P^{z', ■)\\Typ{w,dz)p{w, dz')7riE X dw) 

<2 j ||P^^(^,.)-^(.xF)||Tvvr(dzxF)^^0. 

Therefore, using Fatou's lemma, we can choose & sei H2 G'B{E x E x F) of 
{p® p)t^{E X •)-full measure such that 

liminf||P^(z, •) - P^{z', OIItv = for all (z, z\w) G H2. 

Now define the set H G'B{E x E x F) as follows: 

H = {{z,z' ,w) (^E X E X F:{z,w),iz',w) €Hi}nH2. 

Then it is easily seen that the set H has {p® p)7r(i? x •)-full measure. 

We now claim that f{z,w) = f{z',w) for every {z,z',w) € H. To see this, 
let us fix some point {z, z' , w) G H, and choose n > such that 

\\P^{z,-)-P^{z',-)\\tv<1. 

Thus P^{z, •) and (z' , •) are not mutually singular. By Lemma 3.1 

w, •) ~ P^{z, •) P^{w, •), w, ■) ~ P^{z', •) ® P^{w, •). 
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Therefore, P"'{z,w,-) and P'^{z',w,-) are not mutually singular. But note 
that, by the definition of H, P^{z,w, •) is supported on the set 

Hi = {{z, w)£ExF: f{z, w) = w)}, 

while P'^{z' ,w, •) is supported on the set 

E2 = {{z,w) G E xF:f{z',w) = f{~z,w)}. 

Thus the fact that P'^{z,w,-) and P'^{z',w,-) are not mutually singular 
implies that Hi n H2 7^ 0, which establishes the claim. 

To complete the proof, define g{w) = J f{z,w)p{w,dz). Then 



\fiz,w) - g{w)\TT{dz,dw) 

< J If^Zjw) — f{z',w)\p{w,dz)p{w,dz')TT{E X dw) = 0. 

Thus f{z,w) = g{w) for vr-a.e. {z,w) £ E x F as desired. □ 

We can now prove the main result of this subsection: if the nondegeneracy 
assumption holds, and if, in addition, the unobserved component {Xn)nez 
is ergodic, then the invariant measure vr is necessarily equivalent to the 
product of its marginals. Note that the ergodicity assumption in this result 
automatically holds when Assumption 2.6 is in force. 

Proposition 3.3. Suppose that Assumption 2.8 is in force, and that 

I \\P^{z,-)-7r{-xF)\\TY7T{dzxF)^^0. 

Then there exists a strictly positive measurable function h:ExF—^ ]0,oo[ 
such that 'jr{dz,dw) = h{z,w)iT{dz x F)tt{E x dw). 

Proof. We begin by noting that 

.(A.F) = X F)P„-(., A). ME X B) = /.(E X ,.)Pl(..B) 

by the invariance of vr. Now let C € 'B{E x F) be a set such that vr(C) = 0. 
As ttP"" = vr, it follows from Lemma 3.1 that 

lc{z\w')P^ {z,dz)P^ {w,dw')-K{dz,dw) = ^ 

for all n G N. But note that 

lc{z,w)-K{dz X F)tt{E X dw) 

lc{z',w')Tr{dz' X F)P^{w,dw')TT{dz,dw) 
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■X, 



< j ||P„^(z,-)-vr(-xF)||Tvvr(dzxF) 

Letting n — ?• oo and using the ergodicity assumption gives 
lc{z,w)TT{dz X F)ir{E x dw) = 0. 



As this holds for any set C such that vr(C) = 0, we have evidently shown 
that 7r(dz x F)'k{E x dw) <^ n^dz, dw). Conversely, choose a set C such that 



lc{z, w)TT{dz X F)tt{E X dw) = 0. 
Then, by Lemma 3.1, we have 

w, C)-K{dz X F)-k{E X dw) = 



for all n G N. By the Birkhoff ergodic theorem, 

N 



^ ^ C) fi^z, w) for TT-a.e. (z, w) e E x F, 



n=l 



where / is a P-invariant function with vr(/) = vr(C). Moreover, by Lemma 3.2 
we have f{z,w) = g{w) for vr-a.e. {z,w) & E x F for some function 5. But as 
we have already shown that n^dz x F)'k[E x dw) <C TT(dz,dw), these state- 
ments hold 7r(dz x F)tt{E x dw)-Si.e. also. Therefore, 



1 ^ /■ 

= — ^ / iw, C)7r(dz X F)7r(^ x dw) 



^^°°> y g{w)TT{E X dw) = J f{z,w)TT{dz,dw)=TT{C). 

As this holds for any C such that f lc{z,w)7r{dz x F)tt{E x dw) = 0, we 
evidently have 7r(dz, dw) <C 7r((iz x F)tt{E x dw), and the proof is complete. 
□ 



3.2. Reversed nondegeneracy. One important consequence of Proposi- 
tion 3.3 is that, if the unobserved process {Xn)nez is ergodic and the tran- 
sition kernel P is nondegenerate, then the nondegeneracy Assumption 2.8 
holds also in reverse time (i.e., the backward transition kernel P' must be 
nondegenerate also). In particular, this implies that the Assumptions 2.6-2.8 
are invariant under time reversal. 
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Lemma 3.4. Suppose that Assumption 2.8 is in force, and that 

\P^{Z, •) - 7r(- X F)\\TY7T{dz X F) 0. 



Then P' is also nondegenerate; that is, there exist transition probability ker- 
nels P(^:E x'B{E) [0, 1] and Q' : F x 'B{F) [0, 1] such that 

P'{z,w,dz',dw') = g'{z,w,z',w')PQ{z,dz')Q'{w,dw') 

for some strictly positive measurable function g': ExFxExF—^ ]0, oo[. 

Proof. Note that by Proposition 3.3 and Assumption 2.8 

B{{Xo,Yo,Xi,Yi) eB)= h{xo,yo)g{xQ,yo,xi,yi)p{dxo,dxi)K{dyo,dyi), 

Jb 

where p{dx, dx') = -K{dx x F)Pq{x, dx'), K{dy, dy') = ■k{E x dy)Q{y, dy'), and 
where g, h are strictly positive measurable functions. Let us now fix any 
versions r{xi,dxo) and k{yi,dyo) of the regular conditional probabilities 
p{Xq G • l-'^i) and h{Yq G • |Yi), respectively. Then by the Bayes formula, 

P((Xo yo) € A\Xi Yi) = L ^(^' ■w)g{z, w, Xi , Yi)r{Xi , dz)k(Xi , dw) 
' ' J h{z,w)g{z,w,Xi,Yi)r[Xi.,dz)k{Yi^dw) 

As P' is a version of P((Xo,lo) ^ " the result follows. □ 

3.3. Equivalence of the observations. We now turn to a different conse- 
quence of the nondegeneracy assumption. It is easily seen that when Assump- 
tion 2.8 holds, the laws of (Yq, . . . ,y„) under p^'"' and P^ ''^ are equivalent 
for any z,z' € E, w € F, n < oo. That is, the laws of the observed process 
under different initializations of the unobserved process are equivalent on 
any finite time horizon. To prove our main results, however, we will require 
such an equivalence to hold on the infinite time horizon. The following result 
is therefore of central importance. 

Proposition 3.5. Suppose that Assumption 2.8 holds. Let be prob- 
ability measures on {E,'B{E)), let rj be a probability measure on {F,'B{F)) 
and let v: E X F ^ ]0, oo[ and v' : E x F ^ ]0, oo[ be strictly positive mea- 
surable functions. Define the probability measures on (E x F,'B{E x F)) 

v{dx,dy) =v{x,y)({dx)r]{dy), v'{dx,dy) = v'{x,y)C'{dx)r]{dy). 

//liminf„_,oo|!P"(X„ G •) -P"'(^n G OIItv = 0, then P^l^y ~ P'^' . 

Proof. Choose any A e such that P'^' (A) = 0. It suffices to prove 

that P'^{A) = 0. Indeed, this shows that P'^ljv- ^ 1"^' \jy, while the reverse 

statement follows as the assumptions are symmetric in v and v'. 
Fix for the time being n G N. Note that by construction 

Ia{x, y) = lA{y{0), ■ ■ ■,y{n), {y{k))k>n)- 
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Let us define the measurable function 

a(yo, • • ■ , yn, Xn) = E^"'3'" {lAivo, ■■■,yn, {Yk)k>l))- 
Then, by the Markov property, 

a{Yo, . . . , y„, X„) = P^(A| V ^l^^), P^-a.s. 
for any initial probability measure p. In particular, 

a{Yo,...,Yn,Xn) = 0, P"'-a.s. 

Let be the law of the Markov chain {Yk)k>o with initial measure r] and 

A/ 

transition kernel Q, and Pq be the law of the Markov chain {Xk)k>o with 
initial measure ^' and transition kernel Pq. By our assumptions, 

n-l 

lAv'{Xo,Yo)l[g{X,,Y,,Xi+i,Yi+i) 

for every A G ^^^pn]' -^^^ particular, the law of (Yq, . . . ,Yn,Xn) un- 
der F'^' and the law of Q''|3ry ^C'Pq equivalent. Therefore, 

[0,n] 

a{Yo,...,Yn,Xn) = 0, {Q^U 0^'P^)-s..s. 

[(),n] 

Choose Sn G B(-E) such that {^P^){-nSn) < C'^o" and iCPo )iSn) = (so Sn 
defines the Lebesgue decomposition of £,Pq with respect to ^'Pq). Then 

IsAXn)a{Yo,...,Yn,Xn) = 0, (Q'^U J$ ^P^)-a.s. 

I0,n] 

Therefore, 

a{Yo, . . . ,y„,X„) < IsgiXn), C^o")-a.s. 

[0,n] 

But, as above, we find that the law of {Yq, . . . , Yn, Xn) under Y"^ is equivalent 
to Q;'^\yY ®S,Pq. Therefore, we obtain immediately 

[0,n] 

a{Yo, . . . ,y„,X„) < IsgiXn), P"-a.s. 

Taking the expectation, we find that P^iA) < P''(X„ G S^). 

At this point, we note that n € N in the above construction was arbitrary. 
Moreover, we have already shown that for any n G N, the law of Xn under P*^ 
is equivalent to ^Pq - Therefore P'^ (Xn G 5,^) = 0, and we find that 

P'^(^) < liminf P'^(X„ G S^) < liminf||P'^(X„ G •) - P'^' {X^ G OIItv = 0. 
Thus the proof is complete. □ 

A useful corollary is the following result. 



p'^'(^) = (p«;®Q'') 
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Corollary 3.6. Suppose that Assumptions 2.6 and 2.8 hold, and let fj, 
be a probability measure on E x F such that fj,{E x •) ^ 7r(i? x •) and 

E^(||P'^(X„ G • \Yo) - P(X„ G OIItv) 0- 
Then P%y <P|jy. If IJ^{E x •) ~ vr(£; x then pt^^Y r~^P\jY. 

Proof. We begin by noting that 

E(||P(X„ G • \Yo) - P(X„ G OIItv) < E(||P^O'^o(X„ G •) - P(^n G OIItv)- 

Therefore, by Assumption 2.6, 

\\P{Xn G • |yo) - P(^n G OIItv ^^-^ ™ P-probabihty. 

As P^{Yq G •) ^ P(^ G •)) this convergence is also in P'^-probabihty. There- 
fore, using dominated convergence and the triangle inequality, 

Bf^{\\P>^{Xn G • \Yo) - P{Xn G • |yo)||Tv) 0. 

By Fatou's lemma, we obtain 

liminf||P'^(X„ G • \Yo) - P(X„ G • |yo)|lTV = 0' P''-a-s. 

Let ly-.F X 'B{E) ^ [0,1], u' : F x ■B{E) ^ [0,1] be versions of the regular 
conditional probabilities P^^{Xq G • |yo)i P(-^o G ■ 1^0)1 respectively. Then 

liminf||P'^('^'-)®'^-(X„ G .) -p'^'("'.-)®<5™(x„ G OIItv = 0, KE x -l-a.e. w. 

n— !>oo 

By Proposition 3.5, it follows that 

pK»,-)®<5«, _ pi^'(»,-)®<5» ^ X O-a.e. w. 

By the Lebesgue decomposition for kernels ([9], Section V.58), there is a mea- 
surable version of the Radon-Nikodym derivative. It follows that 

where we have used that fx{E x •) <C 7r{E x •). If fi{E x ■) tt{E x •), then 
clearly <^ can be replaced by ~ in the previous equation. □ 

4. Proof of Theorem 2.9. The goal of this section is to prove The- 
orem 2.9. To this end, we begin by recalling the basic result from [20] 
on the ergodicity of Markov chains in random environments. This result 
will be used to establish that the unobservable process {Xn)n>o has trivial 
tail c7-field under the conditional measure P(-|3"^). Finally, we show that 
P{Xq G • 1 3"^) ~ P(^o G • 1 3"+), which allows us to complete the proof by 
applying a result of von Weizsacker [24] . 
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4.1. Markov chains in random environments. We begin by recalling the 
relevant notions from [20], Section 2. A Markov chain in a random environ- 
ment is defined by the following three ingredients: 

(1) A probability kernel : E x x "BiE) [0,1]. 

(2) A probability kernel zz7 : fi^ x S ^ [0, 1] such that 

j P^{z,y,A)nj{y,dz) = w{ey,A) lor al\ y GfP" ,Ae'B(E). 

(3) A stationary probability measure on 

The process X„ is called a Markov chain in a random environment when 

(x„ , y o e" , ^) = p € A I V ) , p-a. s. , 
w(yoe",yi) = p(x„ G p-a.s. 

for every A G 23 (i?) and n € Z, and P^ = P\jy . One should think of a Markov 
chain in a random environment Xn as a process that is Markov condition- 
ally on the environment Y. The conditional chain is time-inhomogeneous but 
must satisfy certain stationarity properties: the environment is stationary 
and the (time-dependent) conditional transition probabilities P^{-, Y o B", •) 
and quasi-invariant measure vjiY o 0", •) are themselves stationary pro- 
cesses with respect to the environment. The stationarity properties ensure 
that Markov chains in random environments behave "almost" like time- 
homogeneous Markov chains; cf. Theorem 4.1 below. 

Let us introduce a probability kernel P. : x OX x — )• [0, 1] so that 

Pz,y{A) = J lA{x)P''{x{n-l),e^~^y,dx{n))x--- 

X P^(x(l), @y, dx{2))P'' (x(0), y, dx{l))5, (dx(0)) 

for A € 3'[o n] • easily seen that ^z,y is a version of the regular conditional 
probability P((X,fc)fc>o € • \3^q V3"^). We can now state the following ergodic 
theorem for Markov chains in random environments ([20], Theorem 2.3). 

Theorem 4.1. The following are equivalent. 

(1) \\P,^y{Xn e •) -P.',s;(X„ G OIItv for {■w®w)'P^ -a.e. {z,z',y). 

(2) The tail a-field = n„>o 3"f^,oo[ 

is a.s. trivial in the following sense: 

Fz,y{^) = Fz,y{Af = Pz',y{A) for all A £7^ and {z, z' , y) G H, 

where H is a fixed set (independent of A) of [zu ® w)P^ -full measure. 

(3) For [vj®w)P^ -a.e. {z,z',y), there is an n £ 'N such that the measures 
Fz,y{Xn G •) and Fz',y{^n £ ■) o,i"s not mutually singular. 
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4.2. Weak ergodicity of the conditional process. Our first order of busi- 
ness is to establisli tliat, under the model defined in this paper, X„ is indeed 
a Markov chain in a random environment in the sense of Section 4.1, where 
the observations Y play the role of the environment; that is, we must show 
that the unobserved process Xn is still a Markov chain conditionally on the 
observations Y satisfying the requisite stationarity properties. This is the 
statement of the following lemma, whose proof is omitted as it is identical 
to that in [20]. As everything that follows is based on this elementary fact, 
however, let us briefly sketch why the result is true for the convenience of 
the reader. It is easily seen that 

P(X„+i € • V J^) o 9"" = P(Xi G • V J^) = P(Xi G • |(t{Xo} V J^). 

The first equality follows from stationarity of P, and the second equality 
follows as 3"[i,oo[ is conditionally independent of 3"_i given o"{Xo,lo} by the 
Markov property of Yn)n£i- We can therefore choose to be a regular 
version of P(Xi G • |(t{Xo} V 3"^). Similarly, we can choose to be a regular 
version of P(Xo G • |3"^), and P^ to be the law of y. It is now an elementary 
exercise to check that these kernels do indeed characterize the process 
as a Markov chain in a random environment in the sense of Section 4.1. 

Lemma 4.2. There exist probability kernels :ExQ,^x 'B{E) [0, 1] 
andw.^ X 'S>{E) — )■ [0,1], and a probability measure P^ on {OX ,3^^), such 
that the conditions of Section 4-1 are satisfied. 

Proof. The proof is identical to that of [20], Lemma 3.3. □ 

The main goal of this subsection is to prove the following theorem. 

Theorem 4.3. Suppose that both Assumptions 2.6 and 2.8 are in force. 
Then any (hence, all) of the conditions of Theorem 4-1 hold true. 

The strategy of the proof of Theorem 4.3 is to show that condition (3) of 
Theorem 4.1 follows from Assumptions 2.6 and 2.8. To this end, we begin 
by proving that Theorem 4.3 would follow if we can establish equivalence of 
the conditional and unconditional transition kernels P^ and P. 

Lemma 4.4. Suppose that Assumptions 2.6 and 2.8 are in force, and that 
there exists a strictly positive measurable function h:E x OX x E ^ ]0,oo[ 
such that 

P^{z,y,A)= f lAiz)hiz,y,z)P{z,yiO),dz,dw) for all A e 'B{E) 
for wP^ -a.e. {z,y). Then condition 3 of Theorem 4-1 holds. 
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Proof. By Assumption 2.6 and the triangle inequality 
J ||P^'^(°)(X„ G •) -P'''^°^(^n G ■)\\^yw{y,dz)w{y,dz')P^{dy)^^0. 

By Fatou's lemma, there is a set Hi of [zu (8) tx')P^-full measure such that 
liminf||P^'J'W(X„ G •) - P^''^(°^(Xn G OIItv = « for ah (z,z',y) G i^i. 

n^oo 

In particular, there is for every {z, z',y) eHiannen such that P^'^'^") (X„ G 
•) and P^''^^'^)(X„ G •) are not mutually singular. 

Now let H2 be a set of tuP^-full measure such that the absolute continuity 
condition in the statement of the lemma holds true for all {z,y) G H2- By 
Lemma A.l, there is a subset C H2 of tuP^-full measure such that for 
every {z,y) G we have P^^y((X„, 0"?/) G -ffa for all n > 0) = 1. It follows 
directly that for every {z,y) G H^, n G N and A G 'S'{E), we have 

/n— 1 

where we have defined the strictly positive measurable function 

n—l „ 

f{xo, . . . ,Xn,y) = '[lh{xi,e'y,Xi+i) / g{xi,y{i),Xi+i,w)Q{y(i),diu). 
On the other hand, we have for every z,y 

/n— 1 
IA{Xn)f'{xo,...,Xn,ym\[Po{x^,dx,+l)5,{dxo), 
i=0 

where we have defined the strictly positive measurable function 

/n— 1 
Y{g{xi,yi,Xi+i,yi+i)Q{yi,dyi+i). 

Therefore Pz,y{Xn G •) ~ P^'^(°)(X„ G •) for ah (z,y) G and n G N. 
To complete the proof, define the following set: 

Hi = {{z, z',y): {z, z' , y) e Hi, {z, y), {z' , y) G H^}. 

Then H/i has (tu ci7)P^-full measure, and for every {z,z',y) G H4, there is 
an n G N such that Pz,yiXn G •) and P^/ G •) are not mutually singular. 
This establishes condition (3) of Theorem 4.1. □ 

We now proceed to prove the following lemma, which verifies the assump- 
tion of Lemma 4.4. This completes the proof of Theorem 4.3. 



26 X. T. TONG AND R. VAN HANDEL 

Lemma 4.5. Suppose that Assumptions 2.6 and 2.8 hold. Then there 
exists a strictly positive measurable function h:Ex x E ^ ]0, oo[ such 
that 

P^iz,y,A)= [ lA{z)h{z,y,z)P{z,y{0),dz,dw) for all A&'B{E), 
Jexf 

for wF^ -a.e. {z,y). 

Proof. By definition, P-^ is a version of the regular conditional prob- 
ability P{Xi G -13"^ V3"^). But by the Markov property of (X„,K„)„gz, 
the cr-fields 3'[i,oo[ ^-iid 3~_i are conditionally independent given a{XQ,Yo). 
Therefore, P^ is, in fact, a version of the regular conditional probability 
P{Xi € • \a{XQ,Yo) V 3'^^^). Moreover, clearly the kernel P defined as 

P{z,w,A) = j lA{z)P{z,w,dz,dw) ior all A e'B{E),{z,w) e E x F 

is a version of the regular conditional probability F{Xi G • |cr(Xo,lo))- Fi- 
nally, we fix throughout the proof arbitrary versions R:ExFx 9"^ — )■ 
[0, 1] and : E X F X E X 3"^ [0,1] of the regular conditional probabili- 
ties P((yfc)fc>i G • |cr(Xo,lo)) and P((Yfc)fc>i G • \a{XQ,YQ,Xi)), respectively. 
To complete the proof, it suffices to show that R-^ {z,w, z' , ■) 
for {z,w,z') G H with P{{Xq,Yq, Xi) G H) = 1. Indeed, if this is the case, 
then by the Lebesgue decomposition for kernels ([9], Section V.58), there is 
a strictly positive measurable function h:Ex OX x — > ]0, oo[ such that 

R^iz,y{0),z,A) = j lAiiyii))i>i)h{z,y,z)Riz,y{0),diy{i))i>i) 

for all A G Jf^^^^ and {z,y{0),z') G H' with P((Xo,lo,^i) G H') = 1. It 
remains to apply Lemma A. 2 to the law of the triple {{Xq,Yq),Xi, {Yk)k>i)- 
It therefore remains to show that R^ {z,w, z' , ■) R{z,w, ■). To this end, 
let us introduce convenient versions of the regular conditional probabilities R 
and R^ . Note that we can write for A G 3"^ 

R{Xo,Yo, AoQ) = E(P^i'^i (A)I(t(Xo, lo)) = P"^"o-^« (A) 

by the Markov property of (X„,y„)„>o, where we have defined 

i^z,widz, dw) = P{z, w, dz, dw) = g{z, w, z, w)Po{z, dz)Q{w, dw). 

On the other hand, using the Bayes formula, we can compute for A G 3"^ 

R''{Xo,Yo,XuAoe) = P{P''''^'{A)\a{Xo,Yo,Xi)) =P''^o.yo.x,^A), 

where we have defined 

A~\ 9{z,w,z',w) (^~\r>( 

Vz,w,z'[dz,dw) = — — — — -—d^>{dz)Q{w,dw). 

J g[z,w, z,w' )Q[w,dw' ) 
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It therefore suffices to show that P'^^'-^'^'' \j-y ~ P'^''''^\j-y for {z,w,z') S H 
with P((Xo, lo) -'^i) £ H) = 1. By Proposition 3.5, it suffices to show that 
hminf IIP'^-."'.-' {Xn G •) - P"^^^ i^n G OIItv = 

n— >oo 

for {z,w, z') e H with P{{Xo,Yo,Xi) e H) = I. But 

E(||p-^o,ib,^i(X„ G •) -P"^«-^«(^n G OIItv) 

= E(||P(X„+i G • |Xo,yo,^l) -P(^n+l G • |Xo,yo)llTv) 

<E(||P(X„+i G • \Xo,Yo,Xi,Yi) - P{Xn+i G OIItv) 

+ E(||P(X„+i G • \Xo,Yo) - P{Xn+i G OIItv) 

= E(||P(X„ G ■ \Xo,Yo) - P{Xn G OIItv) 

+ E(||P(X„+i G • \Xo,Yo) - P{Xn+i G OIItv)' 

where we have used the triangle inequahty and the stationarity of P. Thus 
the result follows from Assumption 2.6 and Fatou's lemma. □ 

4.3. Exchange of intersection and supremum of a-fields. Fix a version 
: X S(£') [0, 1] of the regular conditional probabifity P{Xo G • |3"^). 
We begin by establishing the validity of the exchange of intersection and 
supremum in Theorem 2.9 assuming that vj^ has a positive density with 
respect to zu. 

Proposition 4.6. Suppose Assumptions 2.6 and 2.8 hold, and that 
there exists a strictly positive measurable function kiOX x E ^ ]0, oo[ such 
that 

'w{y,A) = j lA{z)k{y,z)w^{y,dz) for all Ae'B{E) 
for P^-a.e. y G O^. Then Hnyo^'l V 3^f^^^^ = 3^1 P-a.s. 

Proof. By Theorem 4.3, there is a set H of (w (8) tz7)P^-full measure 
with 

P,,y{A) = P,,y{Af = P^',yiA) for all Ae7^ and {z, z' , y) G H. 

As H has (tx7(X'tz7)P^-full measure, there clearly exists a set G 'B{QX) of 
P^-full measure such that / Ih{z, z' , y)w{y, dz)w{y, dz') = 1 for all y G . 
Let us now define Py{A) = JPz^y{A)zu{y,dz). Then 

PyiA)-Py{Af = J lH{z,z',y)P,^y{A){l-P,,^y{A))wiy,dz)w{y,dz') = {) 

for every y G and A G T"^. Thus 7^ is P^-trivial for all y G . There- 
fore, defining P^{A) = J Pz^y{A)w~^{y,dz), our assumption that w~^{y,-) ~ 

vj^y, •) P^-a.e. y G implies that 7^ is P^-trivial for P^-a.e. y G 0^. 
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Now recall that, by definition, 'Pz,y is a version of the regular conditional 
probability P((Xfc)fc>o G • |o"(Xo) V 3^^). But as 3"^^ is conditionally inde- 
pendent of 3"+ given (j(Xo,lo) by the Markov property, it follows that Pz,y 
IS even a version of P((Xfc)fc>o G • k(-'^o) V 3"^). It therefore follows that P+ 

is a version of the regular conditional probability P((Xfc)fc>o G •|3"+)- We 
have therefore shown that T"''' is P(- |3"^)-trivial P-a.s., which implies 

n>0 

by Lemma A. 4 in Appendix A. □ 

To prove Theorem 2.9, we must therefore establish that has a postive 
density with respect to ro. It is here that the time-reversed Assumption 2.7 
enters the picture; indeed, the alert reader will not have failed to notice that 
we have only used Assumptions 2.6 and 2.8 up to this point. 

Lemma 4.7. Suppose that Assumptions 2.6-2.8 are in force. Then there 
exists a strictly positive measurable function k-.OX x E ^ ]0, oo[ such that 

'[u{y,A) = j lAiz)k{y,z)w'^{y,dz) for all Ae'B{E) 
for P^-a.e. y G 0^. 

Proof. By the Markov property of (X„,l^)„gz, we find that 
P^''^°((>fc)fc<o e •) and P"^^('^'')®^wj((yfe)fc<o G •) are versions of the regular 
conditional probabilities P((yfc)fc<o ^ ' Wi^o) V?'^) and P((yfc)fc<o ^ ' 
respectively. Applying Lemma A. 2 to {(Yk)k>o, Xq, {Yk)k<o), it sufhces to 
show that 

for roP^-a.e. {z,y). By Lemma 3.4, we may apply Proposition 3.5 to the 
reverse-time model. Therefore, it suffices to prove that 

liminf||P^'3'o(X_„ G •) - P^+^f'-)®^^) G OIItv = 

n— i>oo 

for roP^-a.e. {z,y). To this end, let us note that 

E(||P^"'^"(X_„ G •) - P-^(^'-)®^^"(X_„ G OIItv) 

< E(||P^O'^o(X_„ G •) - F{X-n G OIItv) 
+ E(||P(X_„ G • \3-l) - P(X_„, G OIItv) 

< 2E(||P^<''^«(X_„ G •) - P(^-n G OIItv). 
Thus the result follows by Assumption 2.7 and Fatou's lemma. □ 

We now complete the proof of Theorem 2.9. 
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Proof. The first part of Tlieorem 2.9 follows immediately from Proposi- 
tion 4.6 and Lemma 4.7. Now note that by Lemma 3.4, Assumptions 2.6-2.8 
still hold if we replace the model {Xn,Yn)nez by the time-reversed model 
{X-n,Y-n)nez- Therefore, the second part of Theorem 2.9 follows immedi- 
ately from the first part by time reversal. □ 

5. Proof of Theorem 2.10. The goal of this section is to prove Theo- 
rem 2.10. We begin by recalling some basic properties of the filter. Then, 
we prove Theorem 2.10 first for a special case, then in the general case by 
a recursive argument. 

5.1. Preliminaries. Recall that 11^ is defined as a version of the regular 
conditional probability P'^(X„ € • |3"^„])- Of course, we are free to choose 
an arbitrary version of the filter, as the statement of Theorem 2.10 does not 
depend on the choice of version (this follows from Corollary 3.6). Nonethe- 
less, we will find it convenient in our proofs to work with specific versions 
of these regular conditional probabilities, which we define presently. 

For notational simplicity, we introduce the following device: for every 
probability measure p on E x F, we fix a probability kernel p.:Fx 'B{E) 
[0, 1] such that pYoiA) = P^(Xo G A\Yo) for ah A G [i.e., p. is a version 

of the regular conditional probability F'^{Xq G • l^o)]- 

Lemma 5.1. Suppose that assumption 2.8 holds. For every probability 
measure p on E x F, we define a sequence of probability kernels Iln '■ x 
'Si{E) [0, 1] (n>0) through the following recursion: 

JlA{z)g{z',y{n-l),z,y{nm{z',dz)U>:^_,{y,dz') 

^ fg{z',y{n-l),z,y{n))Po{z\dz)U^^_,{y,dz') ' 

U!^{y,A) = py^oM). 
Then Un is a version of the regular conditional probability P'^(X„ G • 
for every n > 0. Moreover, n^(?/, •) ~ p^'ym'^^v{o) (x„ G •) for all y, n. 

Proof. By construction, we have 
P^(Xo edxo,...,Xn£dxn,Yoe dyo, • G dyn) 

n~l 

= p{E X dyo)pyo{dxo) Y[9{xi,yi,Xi+i,yi+i)Po{xi,dxi+i)Q{yi,dyi+i). 

i=0 

Therefore, the Bayes formula gives for any A (^'B{E) 

%,n]) 

J lA{xn)p-Yo {dxo) g{xi,Yi,Xi+i,Yi+i)Po{xi,dxi+i) 



J pyq {dxo) nr=o dixi , Yi , Xi+i , Yi+i)Po {xi , dxi+i) 
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This clearly coincides with the recursive definition of 11^. Moreover, it follows 
directly that n^(y, ■) ~ iJ,y(^Q)PQ for all y,n. But note that 

/n—l 
lA{Xn)f^w{dxo)f{w, Xo,...,Xn)Yl Po{Xi, dXi+i) , 
i=0 

where we have defined 

/n— 1 
Y{g{xi,yi,Xi+i,yi+i)Q{yi,dyi+i). 
i=0 

Therefore, W^iy, •) ~ fJ,y(o)Po ^ P'^y'-o^^^yW {Xn G •) for every y, n. □ 

Throughout the remainder of this section, the nonlinear filter Un will 
always be assumed to be chosen according to the particular version defined 
in Lemma 5.1. This entails no loss of generality in our final results. 

Remark 5.2. From the recursive formula for Iln, we can read off that 

K+^{y, A) = n^"(^'-)®''-(") (e«2/. A) for all n,m>0,y ,Ae ^E)- 

This recursive property will play an important role in our proof. One of the 
advantages of our specific choice of version of the filter is that this property 
holds pathwise, so that we need not worry about the joint measurability of 
n^(y, •) with respect to (y, ^l). Of course, our choice of version is not essential 
and technicalities of this kind could certainly be resolved more generally if 
one were so inclined. 



5.2. The absolutely continuous case. We begin by obtaining an explicit 
formula for the limit of ||n^ — XIJ^Htv for absolutely continuous measures 
/i <C z^. This result will be applied recursively in the proof of Theorem 2.10. 



Proposition 5.3. For any probability measures on E x F with 



limsup||n^ — nj^llxv 

■ n. — ^no -I 



:E'' 



dp 



71>0 



(Xo,yo)|n^lv:jf:$,^[)-E'^(^(Xo,yc 



Y 



Proof. As dP'^/dP'' = ((i/x/dz^)(Xo,lo) by the Markov property, we 
have 



E-(/A(X„)E-((d/./dz.)(Xo,yo)k(^n)v3-^^„j)|3-^^„j, 



E'^{{d^l/dv){XQ,YQ)\'JJ^, 
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P^-a.s. by the Bayes formula. Therefore, we evidently have 



B'^{{dfi/du){Xo,Yo)\a{Xn)y3-l^^^ 



'{{dfi/du){Xo,Yo)\3-f^^^ 



In particular, we can write P^-a.s. 



||n^:-n^||Tv = J 

where we have defined 



dll^ 



dU 



-{x)-l 



Ul^dx) 



E''(M„|3-; 



Y 

[0,n] 



B'^iidfi/du)iXo,Yo)\3'l^^]y 



du 



{Xo,Yq) 



J, 



Y 

[0,n] 



Thus it is easily seen that 



limsup||n^ — n,^||TV 



:E'' 



limsupE'^(M„|g-^„] 



Now note that, by the Markov property, 3"[n+i,oo[ and (t(Xo) V3"p^_-^| are 
conditionally independent given a{Xn-,Yn). Therefore, 



El|(x„.r„) 



^1 



Y 

[0,n] 



If dfi/dv were uniformly bounded, the result would follow directly from the 
martingale convergence theorem and Hunt's lemma ([9], Theorem V.45). 
In the case that d^/du is unbounded, define the truncated process 



B' 



yf dp 
du 



{Xo,Yo)Ak 



■X 



dv 



iXo,Yo)Ak 



[0,n] 



By Hunt's lemma and dominated convergence, 

hm hm E'^(M„^|J^ ,) = E^(Moo|J^), P'^-a.s., 

fc— >-OOl— >00 L > J 

where M^o =liioa.n^oo Mn- Therefore, we obtain P'^-a.s. 



lim sup E^ (M„ I , ) = E"^ (Moo I ) + lim sup lim sup E^ (M„ - I , ) . 



It remains to note that the second term vanishes P'^-a.s., 



limsuplimsupE''(M„ - M^'|3"^ , 



fc-i>oo n.->oo 



< 2 lim sup limsnpE''{'^(Xo,Yo)-^(Xo,Yo)Ak 
' du dv 



k^oo n-s-oo 

The proof is complete. □ 



[0,n] 



0. 



5.3. The general case. In the special case where /i <C vr, Theorem 2.10 
follows directly from Proposition 5.3 and Theorem 2.9. An additional step 
is needed, however, to prove Theorem 2.9 in the general case. 
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Lemma 5.4. Let fi,p be probability measures on E, and choose S € 
such that n{S) > 0. Define the probability measure v = PI S)/fi{S). Then 

Proof. If /u(5') = 1, the proof is trivial. Otherwise, by the Bayes for- 
mula, 

pM®(St„_a,.s., where = n S"") / fi{S'') . But obviously 

P'^®'^"'-a.s., so the result follows directly. □ 

Remark 5.5. Even though we have fixed a version of the filter 11^®'^"', 
our results should ultimately not depend on the choice of version. In this 
light. Lemma 5.4 may appear somewhat suspicious as the regular condi- 
tional probability P''®'^"'(X„ G is i^ot P^®'^"'-a.s. uniquely defined. 
However, there is no problem here, as the proof shows that the inequality 
in Lemma 5.4 holds for any choice of version, even though different versions 
may be inequivalent. On the other hand, we will ultimately apply this result 
only when <C p, in which case the expression is in fact independent of the 
choice of version. 

The idea is now to apply the recursive property of the filter: 

||lim+n - -t^m+nllTV - ll-lin o fc) , -j - li„ o fc) ,-j||TV 

for any m > 0. As B^ ~ P^(Xm G • \Yq) and ~ P(X„ G • \Yq) by Lem- 
ma 5.1, the assumption of Theorem 2.10 guarantees that the singular part 
of Ylm with respect to BJ^ vanishes as m — ?• oo. We can therefore use Lem- 
ma 5.4 to replace lim by its absolutely continuous part, so that we have 
reduced the limit as n — t- oo to the special case of Proposition 5.3. In order 
to apply Proposition 5.3, however, we will require one additional result. 

Lemma 5.6. Suppose that Assumptions 2.6-2.8 hold. Then for any m > 
v:J[^_„,[ = :J^, pn?^{s/, )®^.M.a.5. /orP^-a.e. y. 

n>0 

Proof. As in the proof of Proposition 4.6, it suffices to establish that 
"j-^ is pn^{y,-)®<5^{,n)(. |g-^)-trivial p^^(y^-)^^(^)-a.s. for P^-a.e. y. Note that 

pn;;{i',-)^^.„(^) = E(P^-'^-(^)|J^_^j) = P(^oe™|3-^^„,]) 
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for all A G 3"_|. by the Markov property. Therefore, 

pn^(yoe-,>5vo (A) = P(A o e™| J^^^,) o 9— = P(A| Jf_^,o])> 

where we used that P is stationary. It follows that P^™(^°® ''^'^^^o is 
a version of P((X„,y„)„>o G • |3"^„^o])- Lemma A.3 

Thus it suffices to show that 7^ is P(- ^j)-trivial P-a.s., which is 
equivalent (by virtue of Lemma A. 4 in Appendix A) to 

n •^f~m,oo[ V 3"f^,oo[ = ^f-m,oo[^ P-a.S. 
n>0 

But this follows directly from Theorem 2.9 and the stationarity of P. □ 

We can now complete the proof of Theorem 2.10. 

Proof. By the recursive property of the filter, 
limsup||n^ — II^IItv 

= hmsupllnf {Y o G", •) - n^"(^'-)®'5^" {Y o G", ■) ||tv 

for all n > 0. Therefore, we obtain P'^-a.s. 
E^(limsup||n^-n^||Tv|^5,„] 

^ fc— >oo 



^ ^n^M^s,,^^ (hmsupllnf (^'-^^'^'"^ - n""^^'')®'^'"' H^y 



y=Y 



where we have used that nn(y, •) and n^(y^, •) are -measurable. 
To proceed, let us first recall that 

n^(y, •) ^ P^'ym»^(o) (X„ € •) and Ul{y, •) ~ p-^vW^^io) (X„ € •) 

for all y, n by Lemma 5.1. Choose a set Sn € 'B{E x F) such that 

pf^^^s^^Xn e ■ n Sn{w)) < p^»^^»(x„ G •) 

and 

P^-®^-(x„e5„H) = i 

for all w G F, where = Is„{z,w) (the existence of such a set follows 

from the Lebesgue decomposition for kernels; [9], Section V.58). Define 

s„(y, •) = n^(y, • n SMmmy. 5n(y(o))). 



34 X. T. TONG AND R. VAN HANDEL 

Then clearly ^n{y, •) ^ ^niu^ ') Ui by Lemma 5.4 



^ fe— )-oo 

< 2P^'^(y'-)'^^(n)^Xo i 5„(y(0))) 

+ E^"(^'-)®^.(") (limsupiinf - n""^^'-)^'^'"^ iItv) . 

^ fc— >-oo ^ 

The last term vanishes for P^-a.e. y by Proposition 5.3 and Lemma 5.6, 
hence, for P'^(y € •)-a.e. y by Corollary 3.6. We have therefore shown that 



E^^ limsuplln^ - n^llTvl^-^.^,] < 2P^(X„ i 5„(yo)|^?-j;,„]), P^-a.s. 
for every n > 0. In particular, we have 



E^(limsup||n^-n^||Tv) < 2P''(X„ ^ 5„(yo)) for aUn>0. 
But as P(X„ i S„(yo)|>o) = P^^o^^^o {Xn i SniYo)) = 0, we obtain 

P^(X„ ^ SniYo)) = E^(P^(X„ i SniYQWo) - V{Xn ^ Sn{Y^)\Y^)) 

< E^dlP'^CX^ G • \Yo) - P(X„ G • |yo)|lTv) 0, 
where convergence follows as in the proof of Corollary 3.6. Therefore, 
limsup||n^ -H^IItv = 0, P'^-a.s., 

k—^oo 

which completes the main part of the proof. To obtain P-a.s. convergence 
(rather than P'^-a.s. convergence) in the case where fi{E x •) ~7r(£^ x •), it 
suffices to note that in this case P'^lgry ~ P|g^y by Corollary 3.6. □ 

6. Proof of Theorem 2.12. The goal of this section is to prove Theo- 
rem 2.12. We begin by developing some details of the basic properties of 
(Iln,Yn)n>o in Section 2.3 under Assumption 2.8. We then complete the 
proof of Theorem 2.12. 

6.1. Markov property of the pair {Iln,Yn)n>o- Throughout this section, 
we assume that Assumption 2.8 is in force. We begin by defining a measur- 
able map U : xFxF^ ?{E) as follows: 

f lA{z)g{z',yo,z,yi)Po{z',dz)iy{dz') 

U{iy,yo,yi){A) =- 



fg{z', 2/0, z, yi)Po{z', dz)v{dz') 



It follows immediately from Lemma 5.1 that 11^ = U (n^_^, y„_i, Y^) P'^-a.s. 
for every n > 1 and /i € 7{E x F). 
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Now define the transition kernel T : x F x 'B{'P{E) x F) ^ [0, 1] as 

r(z^, yo, ^) = y yo, yi),yi)P{z, yo,dz', dyi)v{dz). 

Then we have the following lemma. 

Lemma 6.1. Suppose that Assumption 2.8 holds. Then the (J'(-E') x F)- 
valued process (n^,l^)„>o is Markov under with transition kernel T. 

Proof. It suffices to note that {Iln,Yn) is -measurable and 

p^(n::+i,y„+i)e^|3-^,„]) 

= p^((f/(n^,y„,y„+i),y„+i)G^|5-^,„]) 

iA{u{ut;^,Yn, w),w)p{z, Yn,dz', dw)u^{dz) = r(n^:, Yn, A) 

for every A G x F). □ 

We can now establish some basic properties of F-invariant measures. 

Lemma 6.2. Suppose that Assumption 2.8 holds. Then for any T -invar- 
iant probability measure m, the barycenter 6m is a P -invariant measure. 
Conversely, there is at least one T -invariant measure with barycenter vr. 

Proof. First, let m € J'(y(F) x F) be a F-invariant measure. Then 

bm{AxB)= [ v{A)Ib{w)T{v' ,w' ,dv,dw)m{dv' ,dw') 

U{v', w', w){A)Ib{w)P{z, w',dz, dw)u' {dz)m{du' ,dw') 

J lA{z)g{z,w' ,z,w)Po{z,dz)iy'{dz) 
J g{z,w',z,w)PQ{z,dz)iy'{dz) 

X g{z, w' , z, w)Po{z, dz)!^' {dz)lB{w)Q{w' ,dw)m{di'' , dw') 
P{z,w',A X B)v'{dz)m{dv' ,dw') 

P{z,w',A X B)bm{dz,dw'). 

Thus the barycenter 6m is P-invariant. 

To prove the converse, let II^ be a version of the regular conditional prob- 
ability P(X„ G • and let lik. n be a version of the regular conditional 
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probability P(X„ G • 13"^ ^j). Applying the Bayes formula as in the proof of 

Lemma 5.1, we find that [/(Xlfc^^, 1^, Y^+i) = 11^ P-a.s. for every k <n. 
By the martingale convergence theorem, it follows directly that 

U{Un,Yn,Yn+l){A)= lim U{Uk,n,Yn,Yn+l){A)=Un+l{A) 
fc— >— oo 

P-a.s. for every A G 23 (i?). As 'S{E) is countably generated, a standard 
monotone class argument shows that U{Iln,Yn,Yn+i) = XI^+i P-a.s. There- 
fore, the proof of Lemma 6.1 shows that {Iln,Yn)n£Z is Markov under P 
with transition kernel F. But as P is stationary, the process (n„,l^)„gz 
is stationary also. Therefore, the law of (no,lo) is a F-invariant measure 
whose barycenter is obviously vr. □ 

6.2. Uniqueness of the T -invariant measure. Given m G 'J'{7'(E) x F), 
define the probability measure Pm on the space J'(-E) x x as 

P^{{mo,Xo,...,Xrr,Yo,...,Yn)eA) 

= j Ia{i^, a;o, . . . , Xn, yo,..., yn)i'idxo)P{xo,yo, dxi,dyi) 

X • • • x P{xn^i,yn~i,dxn,dyn)m{di^,dyo). 
We now choose regular versions of the following conditional probabilites: 

|-|mm ^ P^(X„ G • |3"p^„]), 

n-=P^(X„G-|a(mo)V 

The following result is straightforward. 

Lemma 6.3. The laws o/(n™",y„) and (n™'^'',y„,) under coincide 
with the laws o/(P^-(X„ G • \3'l,n])^Yn) and (P^'^(X„ G • |a(Xo) V Jj;^^,), y„) 

under P''"^, respectively. Moreover, the process (nj^,l^).„>o is Markov un- 
der Pm with transition kernel F and initial measure m. 

Proof. By definition of the barycenter, the law of (X„,l^)„>o un- 
der Pm coincides with the law of (X„,y„)„>o under P''"^. Moreover, it is 
easily seen that n™'"'' = Pm{Xn G • |cr(Xo) V J^^j) by the Markov property, 

so n™*^^ and H™™ depend on (A'„,y„)„>o only. This establishes the first 
part of the result. The second part follows as in the proof of Lemma 6.1. □ 

We can now complete the proof of Theorem 2.12. 

Proof. Throughout the proof, let m be a fixed F-invariant probability 
measure with barycenter vr. We will show that, by virtue of Theorem 2.9, 
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this invariant measure must necessarily coincide with the invariant measure 
obtained in the proof of Lemma 6.2. 

Let p G N, choose arbitrary bounded measurable functions f -.E and 
g: F and let k:MP^^ — ?> R be a convex function. Then k is necessarily 

continuous, so the function F : y{E) x F ^M. defined by 

F{u,w) = k(^{w), j f{x)u{dx) 

is bounded and measurable. By Jensen's inequality, 

E^(F(^-^y„))<E,(F(^-,y„))<E.(i^(^^^>;^)) 

for all n > 0. Therefore, by Lemma 6.3 and the F-invariance of m, we obtain 
E(k(5(K„),E(/(X„)|jJ^„])))< j F{u,w)m{du,dw) 

<E(K(5(y„,),E(/(X„)|a(Xo)VJ^,,]))). 
But using stationarity of P and the Markov property of (JCjj, 

EK5(y„),E(/(X„)|:?^,„])))=E(«(5(yo),E(/(Xo)|jf:„^o])))' 
E(Ac(5(yO,E(/(XO|a(Xo)VJ^,,])))=E(^(g(yo),E(/(Xo)|J^v:F^J)) 
for all n > 0. Thus martingale convergence and Theorem 2.9 yield 

F{v,w)m{dv,dw) = 'E.{K{g{Y^),^{f{Xo)\'jl)))= I F{v,w)m\dv,dw), 



where m*^ denotes the distinguished L-invariant measure obtained in the 
proof of Lemma 6.2. But a standard approximation argument shows that 
class of functions of the form F{i>,w) = K,{g{w),J f{x)v{dx)) is measure- 
determining (see, e.g., the proof of Proposition A. 7 [22]), so we can conclude 
that m = m'^. Thus we have shown that any L-invariant probability measure 
with barycenter vr must coincide with m^, which establishes uniqueness. 

To complete the proof, it remains to consider the case when P has unique 
invariant probability measure (i.e., vr is the only P-invariant probability 
measure) . As the barycenter of any L-invariant probability measure must be 
P-invariant, this implies that any L-invariant measure must have barycen- 
ter vr. Therefore, in this case, F has a unique invariant probability measure. 
□ 

7. Proof of Theorem 2.13. The goal of this section is to prove Theo- 
rem 2.13. We begin by developing some details of the basic properties of 
(Jin-, Xn-,Yn)n>o in Section 2.3 under Assumption 2.8. We then complete the 
proof of Theorem 2.13. 

7.1. Markov property of the triple (H^, l^)„>o. In this section we 
use the notation of Section 6.1, and we again assume that Assumption 2.8 is 
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in force. Define the transition kernel A : x E x F x 'B{'J'{E) x E x F) 
[0, 1] as 



A{u,xo,yo,A) = J lA{U{u,yo,yi),xi,yi)P{xo,yo,dxi,dyi). 
Then we have the following lemma. 

Lemma 7.1. Suppose that Assumption 2.8 holds. Then iJin,Xn,Yn)n>o 
is a {'?{E) X E X F) -valued Markov chain under with transition kernel A. 



p^((n::+„x„+i,y„+i)eAjj[o,„]) 

= j lAiUiUl^,Yn,w),Z,w)PiXn,Yn,dz,dw) 
= A{U^,Xr,,Yn,A) 

for every Ae'B{'P{E) X E X F). □ 

For any probability measure M S 'P{'J'{E) x E x F), we define probability 
measures mM e x F) and 7M € y(5'(£') x F) as follows: 

mM{A X B) = M(y(^) xAxB), -fM{C x B) = M(C xExB). 



We can now establish some basic properties of A-invariant measures. 

Lemma 7.2. Suppose that Assumption 2.8 holds. Then for any A-invar- 
iant probability measure M, the marginal mM is a P -invariant measure. If, 
in addition, M € 9Jt, then 7M is a T -invariant measure with barycenter mM. 
Conversely, there is at least one A-invariant M G 9Jl with marginal vr. 

Proof. Let M G J'(CP(£') x i? x F) be a A-invariant probability measure. 
It is trivial that mM is P-invariant. Now suppose that also M G 9Jt. Then 



Proof. It suffices to note that (n^,X„,y„) is 9"[o,n] -measurable and 



Moreover, we define the class 






/yl(z/', w')A{i', z, w, du' , dz , dw')M{dv, dz, dw) 
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= J lA{U{i',w,w'),w')P{z,w,dz' ,dw')M{di', dz,dw) 
= J lA{U{i',w,w'),w')P{z,w,dz' ,dw')i'{dz)jM{di',dw) 

= J lA{y' ,w')V{v,w,dv' ,dw')^M{dv, dw), 

where we have used that M € SUt in the penultimate equahty. Thus 7M is 
a F-invariant measure. Moreover, it follows from the definition of 9Jt that 

j i^{B)Ic{w)-fM{di^,dw) = M{'?{E) xBx C) = mM{B x C), 

so mM is the bary center of 7M. Finally, let Hq be a version of the regular 
conditional probability F{Xq € • Then as in the proof of Lemma 6.2, 

the law of {IIq,Xq,Yq) is a A-invariant measure in 9JT with marginal vr. □ 

7.2. Uniqueness of the A-invariant measure. The first part of the proof 
of Theorem 2.13 follows easily from Theorem 2.12 and Lemma 7.2. 

Lemma 7.3. Suppose that Assumptions 2.6-2.8 hold. Then there is 
a unique A-invariant probability measure with marginal vr in the class 9Jt. 

Proof. Lemma 7.2 guarantees the existence of a A-invariant measure 
in 9Jl with marginal vr. To prove uniqueness, note that every probability 
measure M S 93T is uniquely determined by 7M as 

M(^ xBxC) = j v{B)lAxc{T^,w)-iU{dv,dw). 

Therefore, by Lemma 7.2, if there were to exist two distinct A-invariant mea- 
sures in 9Jt with marginal vr, then there must exist two distinct F-invariant 
measures with barycenter vr, in contradiction with Theorem 2.12. □ 

The second part of the proof of Theorem 2.13 relies on Theorem 2.10 
instead of Theorem 2.12. To prepare for the proof, we begin by showing that 
the strengthened variant of Assumption 2.6 in Theorem 2.13 is equivalent 
to the requirement that the assumption of Theorem 2.10 holds universally. 

Lemma 7.4. The following are equivalent: 

(1) For every probability measure n on Ex F such that ^{E x •) = tt^E x ■) 
j ||P^'"'(^n G •) - P(^n G ■)hyli{dz, dw) 0. 
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(2) For every probability measure fi on E x F such that fi{E x •) <C it{E x •) 

E'^(||P'^(X„ € • \Yo) - P(X„ G OIItv) 0. 

Proof. (1) =^ (2). Let /x be any probability measure on E x F with 
n{E X ■) <^ ■k{E X •), let ^w{dz) be a version of the regular conditional 
probability P^(Xo G • \Yq) and define fj,'{dz,dw) = Hwidz)'K{E x dw). Then 
fi' {E X ■) = tt{E X ■) , so the first statement of the lemma implies that we 
have 

||P^«'^«(X„ G •) - P(^n e OIItv in P^' -probability. 

But // <C /i' by construction, so the convergence also holds in P^-probability. 
Therefore, we obtain by dominated convergence 

E^(||P'^(X„G-|yo)-P(^nG-)llTv) 

< E^(||P^O'^o(X„ G •) - F{Xn G OIItv) 0. 

Thus the second statement of the lemma follows. 

(2) =^ (1). Let n be any probability measure on E x F such that fx{E x •) = 
7r{E X •) and let ^w{dz) be a version of the regular conditional probabil- 
ity P^(Xo G -l^)- By [13], Lemma 3.22, there is a measurable function 
i.:F X [0,1] E such that J f[z)^w{dz) = f{i{w,x))dx for all w. Ap- 
plying the second statement of the lemma to ^^{dz^dw) = 5^[w,x){dz)^{E x 
dw) = 5^(^yj^x){dz)iT{E X dw) gives 

j \\-p^{^,^\^^Xn G •) - P(^n G OIItv/^C-^ X M for all x G [0, 1]. 

Thus the first statement of the lemma follows by integrating with respect to 
• dx and applying the dominated convergence theorem. □ 

Let us note that only the first half of this result is needed in what follows. 
However, the equivalence of the two assumptions shows that we have not 
unnecessarily strengthened the assumptions of Theorem 2.13. 

For the proof of Theorem 2.13, we require another lemma. 

Lemma 7.5. Suppose that Assumptions 2.6-2.8 are in force and that 
E^(||P'^(X„ G • \Y,) - P(X„ G OIItv) 
for every probability measure on E x F with i_i{E x ■) <^ tt[E x •). Then 

j E^'"'(||n;r^"'"')®'^- -n^||Tv)vr(dz,du;) 

for any measurable function m:ExF^ ^(-E') ■ 



ERGODICITY AND STABILITY OF CONDITIONAL DISTRIBUTIONS 41 

Proof. By Proposition 3.3 and the Bayes formula, there is a strictly 
positive measurable function h-.ExF—?' such that the probability kernel 

TT (z, A) = J— . ^ , ^ — for all zeE, AG 

J n[z, w)TT[E X dw) 

is a version of the regular conditional probability P(Yo ^ ■ l-'^o)- In particular, 
TT^iz, ■) ~ Tr{E X •) for all z G E, so by our assumptions and Corollary 3.6 
we obtain pS,^7r^{z,-)y^ ^ ^-^l z€E. 

Fix a measurable function m:E x F ^ '■Pi^)- For every z G E, define 
^^{dz\dw) = m{z,w){dz')Tr{E x dw). Then by Theorem 2.10, we have 



||n^ -II^IItv ^0, P-a.s. 

for all z G E. Thus by P^^®'^^ ^^''^\^y ~ P|jy and dominated convergence, 

for ah zGE. But by Lemma 5.1 we have = n^T^'^'"'''®''"' P^'^'-a.s. for all 
n > 0. Integrating with respect to Tr{dz x F) and applying the dominated 
convergence theorem completes the proof. □ 

We now proceed to the proof of Theorem 2.13. Let Z be any Polish 
space endowed with the complete metric dz- Recall that the space J'(^) is 
Polish when endowed with the metric (cf. [12], Theorem 11.3.3 and Corol- 
lary 11.5.5) 



f{z)v{dz) - / f{zy{dz 



:sup|/(x)|<l. 



sup —, r < 1 

x,y&z dz[x,y) 
In particular, the complete metric 

D{{v,z,w), {u',z',w')) = u')+dE{z,z') + dF{w,w') 

metrizes the topology of x E x F. 

Proof of Theorem 2.13. The first part of the theorem was estab- 
lished in Lemma 7.3. For the remainder of the proof, let us assume that 
one of the equivalent assumptions in Lemma 7.4 is in force. We will show 
that any two A-invariant probability measures with marginal vr must coin- 
cide. 

To this end, let M and M' be two A-invariant probability measures with 
marginal vr. By [13], Lemma 3.22, there exist measurable functions m: 
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^ X F X [0, 1] ^ y(^) and m' -.Ex Fx [0, 1] y(^) such that 

f{v,z,w)M{dv,dz,diu)= I I f{m{z,w,x),z,w)Tr{dz,dw)dx, 



I 



f{i',z,w)M'{di',dz,dw)= I I f{m'{z,w,x),z,w)-ir{dz,dw)dx 



for every bounded measurable function / : y{E) x E x F —?■ M.. Moreover, 
note that by the definition of A and Lemma 5.1 

J f{u', Z', W')k^{u, Z, W, du\ dz', dw') = E^'-(/(n^,«^- , Xn, Yn)). 

Let us now fix a bounded function / such that 

\f{v, z, w) - f{u', z', w') I < D{{u, z, w), {u', z', w')) 
for all I', v' € '^{E), z, z' € E, w, w' £ F. We can now estimate 

/(z^, z, w)M{di', dz, dw) — J f{v, z, w)M'{dv, dz, dw) 



< / / J]-^i*"('||JJ™{-2:'"'i^)'X'<5m _ -Qm'{2,U),x)(5 



I TV 



)7r{dz, dw) dx 



for every n > 0, where we used that MA" = M and M'A" = M'. By the 
triangle inequality, Lemma 7.5 and the dominated convergence theorem, the 
right-hand side of this inequality converges to zero as n — >■ oo. Therefore, we 
have shown that 



/ 



f{i',z,w)M{di',dz,dw)— / f{u,z,w)M'{di>,dz,dw) 







for all bounded functions / that are 1-Lipschitz for the metric D. In other 
words, c?y(y{£;)x_BxF)(M, M') = 0, so M = M'. Thus we have shown that all A- 
invariant probability measures with marginal vr must coincide, establishing 
uniqueness. 

To complete the proof, it remains to consider the case when P has unique 
invariant probability measure (i.e., tt is the only P- invariant probability mea- 
sure). As the marginal of any A-invariant probability measure must be P- 
invariant, this implies that any A-invariant measure must have marginal vr. 
Therefore, in this case, A has a unique invariant probability measure. □ 



Remark 7.6. It is instructive to note that Assumptions 2.6-2.8 are not 
sufficient to ensure uniqueness of the A-invariant probability measure even 
in the case that P has a unique invariant probability measure. Let us briefly 
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sketch a counterexample. Let E = M x {0, 1} and F = M, and consider the 
fihermg model 

where {6.n)n>o, {Vn)n>o are i.i.d. iV(0, l)-distributed random variables. It is 
clear that the corresponding transition kernel P has a unique invariant prob- 
ability measure vr [with 7r(- xF) = N{0, l)®6o] and that Assumptions 2.6-2.8 
hold. 

Now let ^( = (5o (g) (5i (g) iV(0, 1). Then 11^ = 7V(m„,fj^) (g) 5i, where 
and (T^ can be computed recursively using the Kalman filtering equations 
corresponding to the model Xn = 2X^-1 + Cnj Yn = Xn + r/n- It is easily 
verified by inspection of the Kalman filtering equations that the law of 
(Iln,Xn,Yn) Converges weakly as n — >■ oo under the stationary measure P. 
The limiting law is therefore a A-invariant probability measure that is sup- 
ported ony(Mx{l}) X E X F. On the other hand, the A-invariant measure 
defined in the proof of Lemma 7.2 is clearly supported on J'(R x {0}) x Ex F. 
Therefore, A has distinct invariant measures. 

This example illustrates that the stronger assumption of Theorem 2.13 
is indeed required to establish uniqueness of the A-invariant measure in the 
class of all probability measures. Of course, the first part of Theorem 2.13 
is not contradicted as the additional A-invariant measure obtained in this 
example is not in 

APPENDIX A: AUXILIARY RESULTS 

The goal of this Appendix is to collect for easy reference a few auxiliary 
results that are used throughout the paper. 

The following result on the existence of invariant sets for stationary 
Markov chains is given in [20], Lemma 2.6. The construction of the set H 
follows closely along the lines of [18], pages 1636 and 1637, so the proof is 
omitted. 

Lemma A.l. Let he the law of a Markov process {^k^k^o Qiveix 
Zq = z, and let v he a stationary probahility for this Markov process. Then 
for any set H of u-full measure, there is a suhset H C H of u-full measure 
such that 

P^'iZnGH for alln>0) = l forallzeH. 

The following elementary can be found in [20], Lemma 3.6. 

Lemma A. 2. Let Gi, G2 and K he Polish spaces and set U = Gi x G2 x K . 
We consider a probahility measure P on (0, 2(0)). Denote 6?/ 71 : 17 — > Gi, 
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72 : ^ G2, and k:Q. ^ K the coordinate projections, and let Qi, S2; CLnd % 
be the a -fields generated by 71, 72 and k, respectively. Choose fixed versions 
of the following regular conditional probabilities: 

Hf (gi, •) = P(k G • \9i){gi), Hf2(5i, <72, •) = P(k € • ISi V 92)igi,92), 

Eiig^,-) = P(72 G • |gi)(<7i), ^liAguk, •) = P(72 G • jSi V X){gi,k), 

where gi € Gi, g2 € G2, k ^ K . Suppose that there exists a nonnegative mea- 
surable function h:Gi x G2 x — [0, oo[ and a set H C Gi x G2 such that 
E(/h(7i>72)) = 1 and for every (91,52) G H 

E^2i9i,g2,A) = j lA{k)h{gi,g2,k)E^(gi,dk) forallAeX. 

Then there is H' C Gi x K with F,{Ih'{ji, k)) = 1 so that for all {gi,k) G H' 

'^iKi9i^k,B) = j lB{g2)Kgi,g2,k)E'f{gi,dg2) for all B£ 92- 

We now recall two results of von Weizsacker that are of central importance 
in our proofs. The first result is a special case of the result in [24], pages 95 
and 96. 

Lemma A. 3. Let G, G' and H be Polish spaces, and denote by g, g' 
and h the canonical projections from G x G' x H on G, G' and H , re- 
spectively. Let Q be a probability measure on G x G' x H , and let q.^. :G x 
G' x'B{H)^ [0, 1] and q.:G x'B{G' x H) ^ [0, 1] be versions of the regular 
conditional probabilities Q[/i G • \g.,g'] and Q\{g' ,h) G • \g], respectively. Then 
for Q-a.e. x G G, the kernel qx,g'[-] is a version of the regular conditional 
probability qx[h G • \g']. 

Though the second result is not given precisely in this form in [24], its 
proof follows easily from [24] modulo minor modifications (see also [20], 
Section 4.1). 

Lemma A. 4. Let G and H be Polish spaces, let {Xn)n>o be a sequence 
of random variables with values in G and let Y be a random variable with 
values in H on some underlying probability space (Q, 3", P) . Define the a- field 
"K = (y{Y} and the decreasing filtration g„ = a{Xk :k>n}. Then 

f^:Kvg„ = :K, p-a.s. 

n>0 

if and only if 

^ 9n is P^-trivial, P-a.s., 

n>0 

where P^ is a version of the regular conditional probability P((X„)„>o G 

.|:k). 
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APPENDIX B: NOTATION LIST 



The following list of frequently used notation, together with the page 
numbers where they are defined, is included for easy reference. 



E 


Sta.tp SDRfP of i]nnV)SPrva,l~)]p roiriDonpnt 


7 


F 


Statp SD^ifP of nhsPTvahlp ronTDonPTit, VI, 


7 


p 


T^'ansi tion Icprnpl of ( VI, l^^-':? 


7 


P' 


Transition kernel of the reversed model {X ri^^ n)nGZ 


7 


pX 


Conditional transition kernel of (^n)nGZ given (l^)nGZ 


23 


Pn 

U 


Reference kernel on E such that P ^ Pq ® Q (Assumption 2.8) . . 


. 9 


pX 


Version of Fi^JCfi G * |^o) (Lemma 3 1) 


16 


P^ 


X/pT'tJlO'n of T^l V^ • Vri 1 ( T ,PTYTm a *^ 1 1 


1 fi 





Reference kernel on F such that P ^ Pq Q (Assumption 2.8) . . 


. 9 


u 


T^iltPT rpmrsinn (TiPiTiTna 2 l* rf Sprtion fill 


8 




Unobservable component of model 


7 


Y 


Ohsprvation nath (Vi.)i.^'77 


7 


^ n 


Obsprvalilp romnonpnt of morlpl 


7 


r 


Transition kernel of (Ilri ^^)n>o (Lemma 2 3' cf Section 6 1) 


8 


A 

i A, 


TS^ PI n t: 1 "I" lOTi VpTTifil of f \ \ ^ V^ ^ ^ r\ ^^T ,PTn TTi ^ r'f ^pr'l'io'n 7 1^ 


8 


Q 


(larinriipal ■naf.n s^napp 


7 




Canonical path space of unobservable component 


7 




Canonical path space of observable component 


7 




The nonlinear filter F^iXr, G • 13"^^ i) fcf. Lemma 5.1) 

J. lie llUllllllCCll lllUCl J. \^i-n ^ \^ [0,n]-' V IJCllllllO, KJ.J^J 


8 


e 


The canonical shift on 


7 


p 


Law of {Xn,Yn)n€Z 


7 




Law of the observations {Yn)nez 


23 




/P^'X(iz,du;) 


8 




Conditional law of (A'„,y„)„gz given Xq = z, Yq = w 


7 


p 

2, J/ 


Conditional law of {Xn)n>o given Xq = z, Y = y 


23 




Borel CT-field of G 


7 




Borel (7-field of 


7 




n-Z 

■^z 


7 






7 




3^fv3^J 


7 


Jf 


a{Zk:keI} 


7 
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rrZ rrZ 7 

y{G) Space of probability measures on G 7 

IXu, Version o{Y'^'{XQ e ■\Yq) 29 

TT Invariant measure of y„)„gz 7 

TT^ Version of V{Yq G • \Xq) (Lemma 3.1) 16 

TT^ Version of P(Xo € • |lo) (Lemma 3.1) 16 

w Conditional law of Xq given (l^)nGZ 23 

Conditional law of Xq given (l^)n>o 27 

6m Barycenter of m 8 

g Transition density of P (Assumption 2.8) 9 
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